VoxCPM
Tokenizer-free open-source text-to-speech with voice cloning across 30 languages
VoxCPM is an open-source text-to-speech model from OpenBMB that generates speech directly from continuous representations, skipping the discrete token step most TTS systems use. It supports 30 languages without language tags, voice cloning from a short reference clip, and voice design from natural-language descriptions.
The model is a 2B-parameter diffusion autoregressive architecture that produces 48kHz audio and runs in real time (RTF around 0.3 on an RTX 4090, needing roughly 8GB VRAM). It is released under Apache 2.0, so it can be self-hosted for personal or commercial use without per-character fees. A hosted browser playground is also available for trying it without local setup.
Pricing: Free (open source); hosted playground uses one-time credit packs
VoxCPM Alternatives
Explore 19 products in the Audio category. View all VoxCPM alternatives.
LemonFox
Affordable speech-to-text and text-to-speech API with 100+ language support
Is your product missing?