Ollama v0.19 rebuilds Apple Silicon inference on top of MLX, bringing much faster local performance for coding and agent workflows. It also adds NVFP4 support and smarter cache reuse, snapshots, and eviction for more responsive sessions.
The engineering in Ollama v0.19 is a massive leap for anyone running local models on macOS. Moving to Apple's native MLX framework changes the game for performance, leveraging the unified memory architecture and the new GPU Neural Accelerators on the M5 chips.
v0.19 now also supports NVFP4, which brings local inference closer to production parity, and the KV cache has been reworked with cache reuse across conversations, intelligent checkpoints, and smarter eviction. For branching agent workflows like @Claude Code or @OpenClaw , that should mean lower memory use and faster responses.
If you have a Mac with 32GB+ of unified memory, you can pull the new Qwen3.5-35B-A3B NVFP4 model and test this right now. Running heavy agentic workflows locally just became a lot more viable!
Feels like this pushes local models from “nice for prototyping” into something closer to real infra. Most setups still break down once you try to run anything stateful or agent-like, so KV cache reuse across conversations feels like the bigger unlock here vs just raw speed. Curious how far this goes in practice — do you see people actually running longer-lived local agents now, or is it still mostly short-lived workflows?
Will have to try this out as a previous version totally drowned my 16gb mini.
The MLX rewrite is the real deal — been running Qwen3.5 locally on my M4 and the speed difference vs the old GGML backend is night and day. Cache reuse across conversations is clutch for agent loops too.
Finally, MLX-native inference. I've been running local models on my M2 Air for quick prototyping when I don't want to burn API credits, and the speed difference on Apple Silicon matters a lot when you're going back and forth between coding and testing. Curious how it handles the bigger models now, like 70B+ quantized. Does the memory management play nicer with other heavy processes running?
This is huge for local-first AI workflows. Curious how much real-world speedup people are seeing on M-series chips
Well done! Do all the current models work automatically with MLX with this version on macOS, or do you need to download a specific version of each model?
Been running Ollama since like v0.12 and the speed improvements keep blowing my mind. The MLX integration is huge for M-series Macs tbh.
Smarter cache reuse is the underrated feature here. I run a coding assistant locally and switching between projects used to basically cold start every time. If the KV cache actually persists across sessions that changes everything for agent workflows.
About Ollama v0.19 on Product Hunt
“Massive local model speedup on Apple Silicon with MLX”
Ollama v0.19 launched on Product Hunt on April 1st, 2026 and earned 425 upvotes and 12 comments, earning #2 Product of the Day. Ollama v0.19 rebuilds Apple Silicon inference on top of MLX, bringing much faster local performance for coding and agent workflows. It also adds NVFP4 support and smarter cache reuse, snapshots, and eviction for more responsive sessions.
Ollama v0.19 was featured in Open Source (68.3k followers), Artificial Intelligence (466.2k followers) and Apple (15.4k followers) on Product Hunt. Together, these topics include over 103k products, making this a competitive space to launch in.
Who hunted Ollama v0.19?
Ollama v0.19 was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how Ollama v0.19 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi everyone!
The engineering in Ollama v0.19 is a massive leap for anyone running local models on macOS. Moving to Apple's native MLX framework changes the game for performance, leveraging the unified memory architecture and the new GPU Neural Accelerators on the M5 chips.
v0.19 now also supports NVFP4, which brings local inference closer to production parity, and the KV cache has been reworked with cache reuse across conversations, intelligent checkpoints, and smarter eviction. For branching agent workflows like @Claude Code or @OpenClaw , that should mean lower memory use and faster responses.
If you have a Mac with 32GB+ of unified memory, you can pull the new Qwen3.5-35B-A3B NVFP4 model and test this right now. Running heavy agentic workflows locally just became a lot more viable!