Sup AI
AI ensemble that scored #1 on Humanity's Last Exam
Productivity
Writing
Artificial Intelligence

Upvotes105

▲ 105View on ProductHunt ⧉

Comments7

7 commentsSee comments on PH ⧉

Featured onApril 7th, 2026

Hunted by

Ken Mueller

Page AI

The most advanced AI website builder • Sponsored

Try now ⧉

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Sup AI

AI ensemble that scored #1 on Humanity's Last Exam

Every LLM hallucinates. They just don't hallucinate the same things. Sup AI runs multiple LLMs (out of 339) in parallel, then synthesizes answers by measuring confidence on every segment. High entropy = likely hallucination, downweighted. Low entropy = likely accurate, amplified. Result: 52.15% on Humanity's Last Exam, 7.41 points ahead of any individual model. $10 starter credit. Card verified. No auto-charge.

Top comment

Upvotes105

▲ 105View on ProductHunt ⧉

Comments7

7 commentsSee comments on PH ⧉

Product of the Day15th

Hey Product Hunt. I'm Ken, a 20-year-old Stanford CS student. I built Sup AI.
I started working on this because no single AI model is right all the time, but their errors don’t strongly correlate. In other words, models often make unique mistakes relative to other models. So I run multiple models in parallel and synthesize the outputs by weighting segments based on confidence. Low entropy in the output token probability distributions correlates with accuracy. High entropy is often where hallucinations begin.
My dad Scott (AI Research Scientist at TRI, PhD from UCLA) is my research partner on this. He sends me papers at all hours, we argue about whether they actually apply and what modifications make sense, and then I build and test things. The entropy-weighting approach came out of one of those conversations.
In our eval on Humanity's Last Exam, Sup scored 52.15%. The best individual model in the same evaluation run got 44.74%. The relative gap is statistically significant (p < 0.001).
Methodology, eval code, data, and raw results:
https://sup.ai/research/hle-white-paper-jan-9-2026
https://github.com/supaihq/hle
Limitations:
We evaluated 1,369 of the 2,500 HLE questions (details in the above links)
Not all APIs expose token logprobs; we use several methods to estimate confidence when they don't
We tried offering free access and it got abused so badly it nearly killed us. Right now the sustainable option is a $10 starter credit with card verification (no auto-charge). If you don't want to sign up, drop a prompt in the comments and I'll run it myself and post the result.
Try it at https://sup.ai. My dad (@scottam) is in the thread too. Would love blunt feedback, especially where this really works for you and where it falls short. If Sup ends up being useful, we added a Product Hunt offer that expires in a week: 20% off your first month with code PRODUCTHUNT.
If you're unsure what I meant by entropy and output token probability distributions, whenever an LLM outputs a token, it's choosing that token out of all possible tokens. Every possible output token has a probability assigned by the model. These sum to 1, forming a probability distribution. APIs typically return these probabilities as logprobs (logarithms of the probabilities) because raw probabilities for rare tokens can be so small they underflow to zero in floating point, and because logprobs are the natural output of how models actually compute their distributions. We use these directly to calculate entropy. Entropy is a measure of uncertainty and can quantify if a token probability distribution is certain (1 token has a 99.9% probability, and the rest share the leftover 0.1% probability) or uncertain (every token has roughly the same probability, so it's pretty much random which token is selected). Low entropy is the former case, and high entropy is the latter.
There is interesting research in the correlation of entropy with accuracy and hallucinations:
https://www.nature.com/articles/s41586-024-07421-0
https://arxiv.org/abs/2405.19648
https://arxiv.org/abs/2509.04492 (when only a small number of probabilities are available, which is something we frequently deal with)
https://arxiv.org/abs/2603.18940
tons more, happy to chat about if interested

About Sup AI on Product Hunt

“AI ensemble that scored #1 on Humanity's Last Exam”

Sup AI launched on Product Hunt on April 7th, 2026 and earned 105 upvotes and 7 comments, placing #15 on the daily leaderboard. Every LLM hallucinates. They just don't hallucinate the same things. Sup AI runs multiple LLMs (out of 339) in parallel, then synthesizes answers by measuring confidence on every segment. High entropy = likely hallucination, downweighted. Low entropy = likely accurate, amplified. Result: 52.15% on Humanity's Last Exam, 7.41 points ahead of any individual model. $10 starter credit. Card verified. No auto-charge.

On the analytics side, Sup AI competes within Productivity, Writing and Artificial Intelligence — topics that collectively have 1.2M followers on Product Hunt. The dashboard above tracks how Sup AI performed against the three products that launched closest to it on the same day.

Who hunted Sup AI?

Sup AI was hunted by Ken Mueller. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

For a complete overview of Sup AI including community comment highlights and product details, visit the product overview.

Sup AIAI ensemble that scored #1 on Humanity's Last ExamProductivityWritingArtificial Intelligence

Product upvotes and comments

Product vs the next 3

Top comment

About Sup AI on Product Hunt

Who hunted Sup AI?

Sup AI
AI ensemble that scored #1 on Humanity's Last Exam
Productivity
Writing
Artificial Intelligence