Sarvam AI unveils Bulbul V3 for Indic text to speech
Third party study across 11 languages, unlimited API access through 28 February 2026.
Sarvam AI has introduced Bulbul V3, a new version of its text to speech model aimed at Indian languages and accents. The company describes V3 as natural, expressive, and production ready, with an emphasis on reliability for call centre and telephony environments. According to the company, Bulbul V3 was assessed across multiple dimensions that affect real world deployments, including naturalness, robustness and stability.
Evaluation and benchmarks
The model was evaluated in an independent blind A/B human listening study conducted by Josh Talks across 11 languages. As per the company’s summary, each language had 50 to 70 annotators, generating roughly 2,000 votes per language and more than 20,000 votes overall. The study compared Bulbul V3 with systems such as ElevenLabs and Cartesia under two conditions, general full band audio and 8 kHz telephony grade. The company reports that Bulbul V3 is the most preferred model at 8 kHz, while ranking competitively in general full band tests. The post also notes a low latency streaming mode designed for near real time audio generation, which is relevant for voice agents and interactive applications.
What is new for developers
Bulbul V3 ships with a refreshed voice library featuring 30 plus professionally recorded voices across 11 Indian languages. Sarvam AI says support will expand to 22 Indian languages in due course. The model also supports consent based voice cloning with safeguards for brand consistency and character identity.
As per the developer documentation, the V3 REST API accepts up to 2,500 characters in a single request and includes pace control, with typical output sample rates ranging from 8 kHz to 48 kHz. Responses can be returned in widely used formats including WAV, MP3, AAC, Opus, FLAC, Linear16, Mulaw and Alaw. A streaming API option is available for low latency use cases.
How does Bulbul V3 reduce errors on messy Indian inputs
The company defines stability through fine grained error categories, including missing words, mispronunciations and extra content. To probe robustness on India specific inputs, Sarvam AI describes an approach that estimates character error rates on numerics, STEM terms, Indian named entities, code mixed and Romanised text and abbreviations. According to the results shared by the company, Bulbul V3 achieved the lowest error rates across these categories among the systems compared, which can matter for workflows like payments, healthcare reminders and compliance calls where a single wrong digit or skipped term may break the task.
Use cases and availability
Examples shared by the company highlight scenarios in BFSI collections, healthcare appointment booking, and education, alongside support for Indian English and code mixed speech. A no code dashboard allows quick testing of different voices, and API access is available for integration into products and voice agents.
Sarvam AI, founded in 2023 by Dr Vivek Raghavan and Dr Pratyush Kumar, has been building a suite of speech and language tools tailored to Indian use cases. With Bulbul V3, the company is signalling a push towards production grade deployments. According to the announcement, developers are being offered unlimited access to Bulbul V3 through 28 February 2026 to encourage trials and stress testing at scale.


