I taught machines to read newspapers, gave them 250 years of data, extracted everything (6 million+ stories so far), separated the ads from the content, and categorized it all. You can search semantically or with you own AI research assistant and get the actual articles with full text extraction, as well as build and share collections. As far as I know, this has never been done before, the data isn't on Google or in any LLM, only on SNEWPAPERS
I'm excited to share SNEWPapers — the world’s first AI-powered historical newspaper archive. We’ve read and organized 6 million+ stories from 250 years of American newspapers (1730s–1960s) so you can finally explore history by meaning, not just broken keywords.
Maybe the biggest news since sliced bread for digital humanities, historians, researchers, genealogists?
I built this after trying to research references in The Fourth Turning. Traditional archives dumped me into faded page scans with terrible search. So I created my own.
The result: clean, summarized articles and nearly perfect full-text OCR extractions + The Sleuth (your personal AI research assistant), smart categorization (24 categories / 1,000+ sub-categories), Collections for sharing, and a fun Today in History daily feed.
Today in History — Would you actually open this daily?
Search + Sleuth — How useful is semantic search and the AI assistant for your research?
Collections — Would you use/share public collections?
Pricing: 7-day free trial. I priced it ~50% below traditional archives because we actually deliver usable, intelligent access. Product Hunt special: Use PRODUCTHUNT20 for 20% off any plan (valid until May 8).
Huge technical journey. I had to figure out how to acquire, store and process nearly a million high-resolution newspaper images, build custom multi-modal systems to detect and segment articles, massively improve OCR on centuries old ink, train models to understand newspaper layout and context, run prompt engineering at scale, balance cost vs quality with LLMs and vLLMs, build semantic and agentic search infrastructure that actually works on millions of documents, and scale a cost-effective GPU fleet.
Some “AWS-ish” stats so far:
115,000+ GPU GB-hours (OCR / Layouts)
26,000+ Lambda GB-hours moving data around
44.7 billion LLM/vLLM tokens processed
7 months of 80+ hour work weeks (organic neural network compute)
Would love your honest feedback and discoveries you make in the archive! 🫡 (here or [email protected])
Very interesting! A few things: (1) Is this just the LOC collection of papers, Chronicling America, or are you getting papers from elsewhere too? I've got a few ideas for additional sources. (2) I'd love to know what tech stack you settled on for the OCR/VLM work — I do a lot of work with 19th-c U.S. newspapers and my quest is to figure out the perfect pipeline/workflow. (3) Just FYI, I just signed up for an account and it immediately told me: "Your free trial has expired. Choose a plan to unlock all features." You might want to change that language ("To start your 7-day free trial, choose a plan..."), and you might want to offer some free searches (without access to the results) to let people see if the content they're interested in is in there. That's what British Newspaper Archive does — you try a search, see there are a few golden documents you really want, and then they ask you to pay.
I have a crazy idea: there are many lost treasures in the world that were never found. In theory, if all printed materials (newspapers, books, etc.) from those times and countries were digitized, then AI could help find them. Do you think this is realistic?
Incredible scale! You mentioned training the model to handle degraded paper and faded ink. Google famously used recaptcha v1 for the same problem, having millions of users unknowingly label words from old NYT archives. How have you coped this issue?
Honestly, this is quite cool!
Do you plan to expand the newspaper libraries to other countries?
About SNEWPapers on Product Hunt
“The World's First AI Newspaper Archive”
SNEWPapers launched on Product Hunt on April 27th, 2026 and earned 122 upvotes and 21 comments, placing #10 on the daily leaderboard. I taught machines to read newspapers, gave them 250 years of data, extracted everything (6 million+ stories so far), separated the ads from the content, and categorized it all. You can search semantically or with you own AI research assistant and get the actual articles with full text extraction, as well as build and share collections. As far as I know, this has never been done before, the data isn't on Google or in any LLM, only on SNEWPAPERS
SNEWPapers was featured in Education (78.4k followers), Artificial Intelligence (467.2k followers) and Data & Analytics (5.6k followers) on Product Hunt. Together, these topics include over 119.4k products, making this a competitive space to launch in.
Who hunted SNEWPapers?
SNEWPapers was hunted by Brett Shinnebarger. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how SNEWPapers stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hey Product Hunt! 👋
I'm excited to share SNEWPapers — the world’s first AI-powered historical newspaper archive. We’ve read and organized 6 million+ stories from 250 years of American newspapers (1730s–1960s) so you can finally explore history by meaning, not just broken keywords.
Maybe the biggest news since sliced bread for digital humanities, historians, researchers, genealogists?
I built this after trying to research references in The Fourth Turning. Traditional archives dumped me into faded page scans with terrible search. So I created my own.
The result: clean, summarized articles and nearly perfect full-text OCR extractions + The Sleuth (your personal AI research assistant), smart categorization (24 categories / 1,000+ sub-categories), Collections for sharing, and a fun Today in History daily feed.
Quick start (10 minutes): → Tutorials
A few things I’d love your thoughts on:
Today in History — Would you actually open this daily?
Search + Sleuth — How useful is semantic search and the AI assistant for your research?
Collections — Would you use/share public collections?
Pricing: 7-day free trial. I priced it ~50% below traditional archives because we actually deliver usable, intelligent access. Product Hunt special: Use PRODUCTHUNT20 for 20% off any plan (valid until May 8).
Huge technical journey. I had to figure out how to acquire, store and process nearly a million high-resolution newspaper images, build custom multi-modal systems to detect and segment articles, massively improve OCR on centuries old ink, train models to understand newspaper layout and context, run prompt engineering at scale, balance cost vs quality with LLMs and vLLMs, build semantic and agentic search infrastructure that actually works on millions of documents, and scale a cost-effective GPU fleet.
Some “AWS-ish” stats so far:
115,000+ GPU GB-hours (OCR / Layouts)
26,000+ Lambda GB-hours moving data around
44.7 billion LLM/vLLM tokens processed
7 months of 80+ hour work weeks (organic neural network compute)
Would love your honest feedback and discoveries you make in the archive! 🫡 (here or [email protected])