ElevenLabs TTS
The ElevenLabs TTS provider enables your agent to use ElevenLabs' high-quality text-to-speech models for generating natural, expressive voice output with advanced voice cloning capabilities.
Installation
Install the ElevenLabs-enabled VideoSDK Agents package:
pip install "videosdk-plugins-elevenlabs"
Importing
from videosdk.plugins.elevenlabs import ElevenLabsTTS, VoiceSettings
Authentication
The ElevenLabs plugin requires an ElevenLabs API key.
Set ELEVENLABS_API_KEY in your .env file.
Example Usage
from videosdk.plugins.elevenlabs import ElevenLabsTTS, VoiceSettings
from videosdk.agents import CascadingPipeline
# Configure voice settings
voice_settings = VoiceSettings(
stability=0.71,
similarity_boost=0.5,
style=0.0,
use_speaker_boost=True
)
# Initialize the ElevenLabs TTS model
tts = ElevenLabsTTS(
# When ELEVENLABS_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-elevenlabs-api-key",
model="eleven_flash_v2_5",
voice="EXAVITQu4vr4xnSDxMaL",
speed=1.0,
response_format="pcm_24000",
enable_streaming=True,
enable_ssml_parsing=False,
apply_text_normalization="auto",
auto_mode="auto",
voice_settings=voice_settings,
)
# Add tts to cascading pipeline
pipeline = CascadingPipeline(tts=tts)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options
model: The ElevenLabs model to use (e.g.,"eleven_flash_v2_5","eleven_multilingual_v2")voice: (str) Voice ID to use for audio output (default: "EXAVITQu4vr4xnSDxMaL")speed: (float) Speed of the generated audio (default: 1.0)api_key: Your ElevenLabs API key (can also be set via environment variable)response_format: (str) Audio format for output (default:"pcm_24000")voice_settings: (VoiceSettings) Advanced voice configuration options:stability: (float) Voice stability (0.0 to 1.0, default: 0.71)similarity_boost: (float) Voice similarity enhancement (0.0 to 1.0, default: 0.5)style: (float) Voice style exaggeration (0.0 to 1.0, default: 0.0)use_speaker_boost: (bool) Enable speaker boost for clarity (default:True)
base_url: (str) Custom base URL for ElevenLabs API (optional)enable_streaming: (bool) Enable real-time audio streaming (default:False)enable_ssml_parsing: (bool) Whether to enable SSML parsing (default:False)apply_text_normalization: (str) Controls text normalization (e.g., spelling out numbers). Modes:- "auto" (default) – System decides automatically
- "on" – Always applied
- "off" – Skipped
Note: Foreleven_turbo_v2_5andeleven_flash_v2_5models, enabling text normalization requires an Enterprise plan.
auto_mode: (bool) Reduces latency by disabling chunk schedule and buffers. Recommended for full sentences/phrases.
Additional Resources
The following resources provide more information about using ElevenLabs with VideoSDK Agents SDK.
- ElevenLabs docs: ElevenLabs TTS docs.
Got a Question? Ask us on discord

