Sarvam AI TTS
The Sarvam AI TTS provider enables your agent to use Sarvam AI's text-to-speech models for generating voice output.
Installation
Install the Sarvam AI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-sarvamai"
Importing
from videosdk.plugins.sarvamai import SarvamAITTS
Authentication
The Sarvam plugin requires a Sarvam API key.
Set SARVAM_API_KEY in your .env file.
Example Usage
from videosdk.plugins.sarvamai import SarvamAITTS
from videosdk.agents import CascadingPipeline
# Initialize the Sarvam AI TTS model
tts = SarvamAITTS(
# When SARVAMAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-sarvam-ai-api-key",
model="bulbul:v2",
speaker="anushka",
language="en-IN",
pitch=0.0,
pace=1.0,
loudness=1.0,
)
# Add tts to cascading pipeline
pipeline = CascadingPipeline(tts=tts)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.
Configuration Options
api_key: (str) Your Sarvam AI API key. Can also be set via theSARVAMAI_API_KEYenvironment variable.model: (str) The Sarvam AI model to use, e.g."bulbul:v2","bulbul:v3","bulbul:v3-beta"(default:"bulbul:v2").speaker: (str) The speaker voice to use (default:"anushka").language: (str) The language code for the generated audio (default:"en-IN").enable_streaming: (bool) IfTrue, uses WebSockets for low-latency streaming. IfFalse, uses HTTP for batch synthesis (default:True).sample_rate: (int) The audio sample rate in Hz (default:8000).output_audio_codec: (str) The output audio codec (default:"linear16").pitch: (float | None) Pitch of the voice. Only supported onbulbul:v2. Range: [-0.75, 0.75]. Set toNoneto omit (default:0.0).pace: (float | None) Pace/speed of the voice.bulbul:v2: range [0.3, 3.0];bulbul:v3/bulbul:v3-beta: range [0.5, 2.0]. Set toNoneto omit (default:1.0).loudness: (float | None) Loudness of the voice. Only supported onbulbul:v2. Range: [0.3, 3.0]. Set toNoneto omit (default:1.0).temperature: (float | None) Sampling temperature. Only supported onbulbul:v3andbulbul:v3-beta. Range: [0.01, 1.0]. Set toNoneto omit (default:0.6).output_audio_bitrate: (str) Output audio bitrate. Allowed values:"32k","64k","96k","128k","192k"(default:"128k").min_buffer_size: (int) Minimum character length that triggers buffer flushing (default:50).max_chunk_length: (int) Maximum chunk length for sentence splitting (default:150).enable_preprocessing: (bool) Controls normalization of English words and numeric entities (e.g., numbers, dates). Recommended for mixed-language text. Only supported onbulbul:v2(default:False).
Additional Resources
The following resources provide more information about using Sarvam AI with VideoSDK Agents SDK.
- Sarvam docs: Sarvam's full docs site.
Got a Question? Ask us on discord

