Sarvam AI TTS

The Sarvam AI TTS provider enables your agent to use Sarvam AI's text-to-speech models for generating voice output.

Installation

Install the Sarvam AI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-sarvamai"

Importing

from videosdk.plugins.sarvamai import SarvamAITTS

Authentication

The Sarvam plugin requires a Sarvam API key.

Set SARVAM_API_KEY in your .env file.

Example Usage

from videosdk.plugins.sarvamai import SarvamAITTS
from videosdk.agents import CascadingPipeline

# Initialize the Sarvam AI TTS model
tts = SarvamAITTS(
    # When SARVAMAI_API_KEY is set in .env - DON'T pass api_key parameter
    api_key="your-sarvam-ai-api-key",
    model="bulbul:v2",
    speaker="anushka",
    language="en-IN",
    pitch=0.0,
    pace=1.0,
    loudness=1.0,
)

# Add tts to cascading pipeline
pipeline = CascadingPipeline(tts=tts)

note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key and other credential parameters from your code.

Configuration Options

api_key: (str) Your Sarvam AI API key. Can also be set via the SARVAMAI_API_KEY environment variable.
model: (str) The Sarvam AI model to use, e.g. "bulbul:v2", "bulbul:v3", "bulbul:v3-beta" (default: "bulbul:v2").
speaker: (str) The speaker voice to use (default: "anushka").
language: (str) The language code for the generated audio (default: "en-IN").
enable_streaming: (bool) If True, uses WebSockets for low-latency streaming. If False, uses HTTP for batch synthesis (default: True).
sample_rate: (int) The audio sample rate in Hz (default: 8000).
output_audio_codec: (str) The output audio codec (default: "linear16").
pitch: (float | None) Pitch of the voice. Only supported on bulbul:v2. Range: [-0.75, 0.75]. Set to None to omit (default: 0.0).
pace: (float | None) Pace/speed of the voice. bulbul:v2: range [0.3, 3.0]; bulbul:v3/bulbul:v3-beta: range [0.5, 2.0]. Set to None to omit (default: 1.0).
loudness: (float | None) Loudness of the voice. Only supported on bulbul:v2. Range: [0.3, 3.0]. Set to None to omit (default: 1.0).
temperature: (float | None) Sampling temperature. Only supported on bulbul:v3 and bulbul:v3-beta. Range: [0.01, 1.0]. Set to None to omit (default: 0.6).
output_audio_bitrate: (str) Output audio bitrate. Allowed values: "32k", "64k", "96k", "128k", "192k" (default: "128k").
min_buffer_size: (int) Minimum character length that triggers buffer flushing (default: 50).
max_chunk_length: (int) Maximum chunk length for sentence splitting (default: 150).
enable_preprocessing: (bool) Controls normalization of English words and numeric entities (e.g., numbers, dates). Recommended for mixed-language text. Only supported on bulbul:v2 (default: False).

Additional Resources

The following resources provide more information about using Sarvam AI with VideoSDK Agents SDK.

Sarvam docs: Sarvam's full docs site.

SDK Reference

GitHub Repository

Python Package

Got a Question? Ask us on discord

Installation​

Importing​

Authentication​

Example Usage​

Configuration Options​

Additional Resources​