VoxCPM2 is a 2B open-source TTS model with 30-language support, 48kHz output, voice design from text alone, controllable voice cloning, and real-time streaming fast enough for production voice workflows.
VoxCPM2 is the next-generation open-source audio model from the @MiniCPM family, and it perfectly continues their signature trait of incredible "capability density" — packing all of these features into a model that is only 2B parameters!
Despite its highly compact size, the feature set it brings to the table is quite rare for an open-source release:
Voice Design: Instead of hunting for the perfect reference audio to clone, you can just prompt the model directly (e.g., (A young woman, gentle and sweet voice) Hello world.). It generates a completely novel voice on the fly.
Native 48kHz Output: It has a built-in super-resolution VAE, meaning no external upsamplers are needed to get studio-quality audio.
Controllable Voice Cloning: You can clone a voice from a short clip, but still steer the emotion, pacing, and style using text prompts.
Production-Ready: It hits an RTF of ~0.13 for real-time streaming and is fully open-source under the Apache-2.0 license.
It is incredibly refreshing to see this level of controllable, high-fidelity audio hit the open-source ecosystem in such a lightweight package.
2B params delivering 48kHz + voice design + cloning is impressive capability density. As someone building an audio/video editing tool that relies on audio analysis for precise segment boundaries, I appreciate how much source quality matters.
Curious: how does VoxCPM2 handle multilingual switching within a single utterance — e.g. Japanese with embedded English terms?
Voice design from text prompts instead of hunting for a reference clip is the thing I didn't know I needed. "A tired middle-aged man reading terms of service" and it just... makes that? 2B parameters for this is wild. Will try it locally today.
About VoxCPM2 on Product Hunt
“Open-source 48kHz TTS with voice design and cloning”
VoxCPM2 launched on Product Hunt on April 13th, 2026 and earned 117 upvotes and 4 comments, placing #9 on the daily leaderboard. VoxCPM2 is a 2B open-source TTS model with 30-language support, 48kHz output, voice design from text alone, controllable voice cloning, and real-time streaming fast enough for production voice workflows.
VoxCPM2 was featured in Open Source (68.3k followers), Artificial Intelligence (466.2k followers), GitHub (41.2k followers) and Audio (2k followers) on Product Hunt. Together, these topics include over 120.2k products, making this a competitive space to launch in.
Who hunted VoxCPM2?
VoxCPM2 was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how VoxCPM2 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi everyone!
VoxCPM2 is the next-generation open-source audio model from the @MiniCPM family, and it perfectly continues their signature trait of incredible "capability density" — packing all of these features into a model that is only 2B parameters!
Despite its highly compact size, the feature set it brings to the table is quite rare for an open-source release:
Voice Design: Instead of hunting for the perfect reference audio to clone, you can just prompt the model directly (e.g., (A young woman, gentle and sweet voice) Hello world.). It generates a completely novel voice on the fly.
Native 48kHz Output: It has a built-in super-resolution VAE, meaning no external upsamplers are needed to get studio-quality audio.
Controllable Voice Cloning: You can clone a voice from a short clip, but still steer the emotion, pacing, and style using text prompts.
Production-Ready: It hits an RTF of ~0.13 for real-time streaming and is fully open-source under the Apache-2.0 license.
It is incredibly refreshing to see this level of controllable, high-fidelity audio hit the open-source ecosystem in such a lightweight package.
Try it out here!