GLM-4.6V is GLM's newest open-source multimodal model with a 128k context window. It features native function calling, bridging visual perception with executable actions for complex agentic workflows like web search and coding.
GLM-4.6V is a significant iteration for the GLM multimodal series. It scales the training context window to 128k and hits SOTA visual understanding for its size.
The biggest update here is the native Function Calling. For the first time in the GLM architecture, tool use is integrated directly into the visual model. This effectively bridges the gap from "visual perception" to "executable action."
It can automatically generate high-quality image-text interleaved content and handle complete workflows independently, like viewing products, comparing prices, and generating shopping lists. The frontend replication and visual interaction capabilities are also impressive, which significantly shortens the path from design to code for developers.
Wow, Z.ai looks incredible! That 128k context window on GLM-4.6V is a game changer. How well does the function calling handle nested tools in agentic workflows?
How consistent is the function calling when handling multi-step visual tasks?
Native function calling + 128k context is huge. This could be a game changer for building actual AI agents instead of just chatbots.
Big question - how does this compare to Claude or GPT-4V in real-world tasks? Any benchmarks? Also curious about API pricing when it goes live.
Seems like solid work! 🚀
Wait wait wait ...
It can analyze images and handle tasks for me ? Nice my lazy era has officially begun 😄
Gave Z.ai a spin on the train. Simple UI, snappy. Rumination mode took a bit but made cleaner steps. 128k context for free is wild. MIT license too. Really want to see how the function calling handles web/code tools. Bookmarking to test a longer task tonight.
Their progress is amazing and I actually love them now more than I loved ChatGPT when it came out. Somehow ChatGPT, even hough they improved GPT models seem to stagnate whereas Z.ai really shines with their Open Source releases. (Which also shows that OpenAI should actually have stayed open, I am pretty sure they would rock the generative AI world by now)
Cool! We used Z.ai for our hackathon in Slovakia (actually, they were our main sponsor), so happy to see how they progress even on PH :)
About GLM-4.6V on Product Hunt
“Open-source multimodal model with native tool use”
GLM-4.6V launched on Product Hunt on December 9th, 2025 and earned 247 upvotes and 11 comments, placing #4 on the daily leaderboard. GLM-4.6V is GLM's newest open-source multimodal model with a 128k context window. It features native function calling, bridging visual perception with executable actions for complex agentic workflows like web search and coding.
GLM-4.6V was featured in Open Source (68.3k followers), Artificial Intelligence (466.2k followers) and Development (5.8k followers) on Product Hunt. Together, these topics include over 100.7k products, making this a competitive space to launch in.
Who hunted GLM-4.6V?
GLM-4.6V was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how GLM-4.6V stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi everyone!
GLM-4.6V is a significant iteration for the GLM multimodal series. It scales the training context window to 128k and hits SOTA visual understanding for its size.
The biggest update here is the native Function Calling. For the first time in the GLM architecture, tool use is integrated directly into the visual model. This effectively bridges the gap from "visual perception" to "executable action."
It can automatically generate high-quality image-text interleaved content and handle complete workflows independently, like viewing products, comparing prices, and generating shopping lists. The frontend replication and visual interaction capabilities are also impressive, which significantly shortens the path from design to code for developers.
Try it on Z.ai or find the open weights on HF.