Product Thumbnail

BenchLLM by V7

Test-driven development for LLMs

Open Source
Developer Tools
Artificial Intelligence

Hunted byAlberto RizzoliAlberto Rizzoli

Simplify the testing process for LLMs, chatbots, and other apps powered by AI. BenchLLM is a free open-source tool that allows you to test hundreds of prompts and responses on the fly. Automate evaluations and benchmark models to build better and safer AI.

Top comment

Hello Product Hunt! We built BenchLLM to offer a more versatile open-source benchmarking tool for AI applications. It lets you measure the accuracy of your model, agents, or chains by validating responses on any number of tests via LLMs. BenchLLM is actively used at V7 for improving our LLM applications and is now Open Sourced under MIT License to share with the wider community. You can use it to: - Test the responses of your LLM across any number of prompts. - Implement continuous integration for chains like LangChain, agents like AutoGPT, or LLM models like Llama or GPT-4. - Eliminate flaky chains and create confidence in your code. - Spot inaccurate responses and hallucinations in your application at every version. Key Features: - Automated tests and evaluations on any number of prompts and predictions via LLMs. - Multiple evaluation methods: semantic similarity checks, string matching, manual review. - Caching LLM responses to accelerate the testing and evaluation process. - Comprehensive API and CLI for executing test suites and faster development iterations. Here's a preview of a common use case in LLM testing and how popular models compare: https://www.loom.com/share/173c1... Visit our GitHub repo to access examples, templates, and docs. Or join our Discord for feedback or to contribute to the project!

Comment highlights

Just curious, what are the benefits over the LangSmith? And good luck with the launch!
This looks really interesting. ?makers How would you recommend dealing with false positives? For example, even using semantic similarity, I imagine you sometimes get some correct answers from a LLM that are flagged as incorrect?
Congrats on launching BenchLLM! 🎉 This versatile open-source benchmarking tool for AI applications sounds like a dream come true for developers. Can't wait to see it in action!

About BenchLLM by V7 on Product Hunt

Test-driven development for LLMs

BenchLLM by V7 launched on Product Hunt on July 21st, 2023 and earned 133 upvotes and 15 comments, placing #10 on the daily leaderboard. Simplify the testing process for LLMs, chatbots, and other apps powered by AI. BenchLLM is a free open-source tool that allows you to test hundreds of prompts and responses on the fly. Automate evaluations and benchmark models to build better and safer AI.

BenchLLM by V7 was featured in Open Source (68.3k followers), Developer Tools (511k followers) and Artificial Intelligence (466.2k followers) on Product Hunt. Together, these topics include over 163.3k products, making this a competitive space to launch in.

Who hunted BenchLLM by V7?

BenchLLM by V7 was hunted by Alberto Rizzoli. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Want to see how BenchLLM by V7 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.