This product was not featured by Product Hunt yet.
It will not yet shown by default on their landing page.

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

LLM Tanks

A 3D tactical artillery game to evaluate LLM reasoning.

Traditional AI benchmarks and A/B testing platforms are excellent for measuring text generation and static knowledge, but they fall short when evaluating complex, multi-step tactical reasoning in a dynamic environment. Enter LLM Tanks: a full-stack 3D game that doubles as an interactive benchmark for evaluating AI tool-use and reasoning. At its core, LLM Tanks is a tactical artillery combat game that pits large language models directly against each other (e.g., Claude vs. Grok vs. GPT).

Top comment

Hi Product Hunt! 👋 I’m incredibly excited to launch LLM Tanks today. Over the last year, we’ve all stared at static AI benchmarks and blind A/B text comparisons. While these are great for measuring raw knowledge, I kept wondering: How do these models actually perform when forced to make multi-step, tactical decisions in a dynamic physical environment? That question led to LLM Tanks. On the surface, it’s a fun, 3D tactical artillery game built with SvelteKit and Three.js. But under the hood, it is a strict, real-world reasoning benchmark where top models (like GPT, Claude, and Grok) battle each other live. To make this a genuine apples-to-apples research tool, I instituted a strict "Equal Terms" architecture. Here is how it works: Zero Scripting: The AI opponents don’t rely on traditional video game logic or pathfinding. Everything you see is a language model actively reasoning on the fly. Identical Directives: Every model receives the exact same system prompt, physics constants, and JSON tool schemas. The playing field is entirely flat—differential performance reflects inherent model capability alone. The Tactical Capability Manifest: Models are given an arsenal of 8 specific tools, ranging from scan_for_enemy and optimize_shot_parameters to plan_movement and check_fuel_cost. They must use these tools to survey the 3D space, calculate ballistics, and maneuver. Forced Rationale: This is my favorite part. Every single tool call the AI makes requires a strict rationale object containing their intent, reasoning, expected outcome, and continuation. You aren't just seeing the tank move; you are watching the model's exact train of thought unfold as it tries to outsmart its opponent. The result is a persistent global leaderboard powered by an Elo rating system, tracking model performance over time as they fight for tactical supremacy. I also added AI commentary via Inworld TTS so you can hear their cold, mathematical logic play out in real-time, plus a Human vs. AI mode if you want to test yourself against the machines. I would love for you to jump in, spectate a few AI battles, or challenge the models yourself. I’ll be here all day to answer your questions! I’m especially happy to nerd out about the prompt engineering, the OpenRouter integration, the SvelteKit/Cloudflare stack, or the wild differences I’ve seen in how various models approach problems. Let the battles begin! 💥

About LLM Tanks on Product Hunt

A 3D tactical artillery game to evaluate LLM reasoning.

LLM Tanks was submitted on Product Hunt and earned 3 upvotes and 1 comments, placing #178 on the daily leaderboard. Traditional AI benchmarks and A/B testing platforms are excellent for measuring text generation and static knowledge, but they fall short when evaluating complex, multi-step tactical reasoning in a dynamic environment. Enter LLM Tanks: a full-stack 3D game that doubles as an interactive benchmark for evaluating AI tool-use and reasoning. At its core, LLM Tanks is a tactical artillery combat game that pits large language models directly against each other (e.g., Claude vs. Grok vs. GPT).

On the analytics side, LLM Tanks competes within A/B Testing, Artificial Intelligence and Games — topics that collectively have 585k followers on Product Hunt. The dashboard above tracks how LLM Tanks performed against the three products that launched closest to it on the same day.

Who hunted LLM Tanks?

LLM Tanks was hunted by Dallas Gordon. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

For a complete overview of LLM Tanks including community comment highlights and product details, visit the product overview.