An open-weight, native vision-language model built for long-horizon agentic tasks. Its hybrid architecture (linear attention + MoE) delivers the capabilities of a 397B giant with the inference speed of a 17B model.
Qwen3.5 is here. It is a native vision-language model with a massive 397B parameter count.
Built on the Qwen3-Next architecture (Linear Attention + MoE), only 17B parameters are active per forward pass. This hits a specific sweet spot: you get the reasoning depth of a giant model with the inference latency of a much smaller one.
For applications, this efficiency is key for agents.
It is natively multimodal with no glued-on vision adapters, demonstrating outstanding results on agentic tasks. This means handling complex workflows without burning through tokens.
Apache 2.0 and ready for vLLM/SGLang out of the box!
397B with only 17B active params is impressive efficiency. The hybrid linear attention + MoE approach seems like the right direction for long-horizon agentic tasks. As someone building a vision AI app for pet health, I'm always watching open-weight multimodal models closely — excited to benchmark this against our current pipeline. Congrats on the release!
Serving a 397B MoE native multimodal model for long-horizon agents will bottleneck on KV-cache growth and multimodal prefill latency, and expert-routing variance can reduce batching efficiency at high throughput. Best practice: run it under vLLM or SGLang with continuous batching plus paged KV cache, add aggressive prompt and image embedding caching, and lean on FP8 where supported to keep cost predictable. :contentReference[oaicite:0]{index=0} Question: what max context length are you targeting for Qwen3.5 in production and how stable is expert routing under long tool-using trajectories when served via vLLM or SGLang?
Linear attention keeping latency flat across long tool-call chains is the part that actually matters for agents. Standard transformers get brutal once you're 50+ steps into a workflow with accumulated context. 17B active params on a 397B base with vLLM support out of the box makes self-hosting realistic too.
About Qwen3.5 on Product Hunt
“The 397B native multimodal agent with 17B active params”
Qwen3.5 launched on Product Hunt on February 17th, 2026 and earned 312 upvotes and 5 comments, earning #3 Product of the Day. An open-weight, native vision-language model built for long-horizon agentic tasks. Its hybrid architecture (linear attention + MoE) delivers the capabilities of a 397B giant with the inference speed of a 17B model.
Qwen3.5 was featured in Open Source (68.3k followers), Artificial Intelligence (466.2k followers) and Development (5.8k followers) on Product Hunt. Together, these topics include over 100.7k products, making this a competitive space to launch in.
Who hunted Qwen3.5?
Qwen3.5 was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how Qwen3.5 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi everyone!
Qwen3.5 is here. It is a native vision-language model with a massive 397B parameter count.
Built on the Qwen3-Next architecture (Linear Attention + MoE), only 17B parameters are active per forward pass. This hits a specific sweet spot: you get the reasoning depth of a giant model with the inference latency of a much smaller one.
For applications, this efficiency is key for agents.
It is natively multimodal with no glued-on vision adapters, demonstrating outstanding results on agentic tasks. This means handling complex workflows without burning through tokens.
Apache 2.0 and ready for vLLM/SGLang out of the box!