Azure TTS
The Azure TTS provider enables your agent to use Microsoft Azure's high-quality text-to-speech models for generating natural-sounding voice output with advanced voice tuning and expressive speaking styles.
Installation
Install the Azure-enabled VideoSDK Agents package:
pip install "videosdk-plugins-azure"
Importing
from videosdk.plugins.azure import AzureTTS, VoiceTuning, SpeakingStyle
Authentication
The Azure TTS plugin requires an Azure AI Speech Service resource.
Setup Steps:
- Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
- Get the Speech resource key and region. After your Speech resource is deployed, select "Go to resource" to view and manage keys
Set AZURE_SPEECH_KEY
and AZURE_SPEECH_REGION
in your .env
file:
AZURE_SPEECH_KEY=your-azure-speech-key
AZURE_SPEECH_REGION=your-azure-region
Example Usage
from videosdk.plugins.azure import AzureTTS, VoiceTuning, SpeakingStyle
from videosdk.agents import CascadingPipeline
# Configure voice tuning for prosody control
voice_tuning = VoiceTuning(
rate="fast",
volume="loud",
pitch="high"
)
# Configure speaking style for expressive speech
speaking_style = SpeakingStyle(
style="cheerful",
degree=1.5
)
# Initialize the Azure TTS model
tts = AzureTTS(
voice="en-US-EmmaNeural",
language="en-US",
tuning=voice_tuning,
style=speaking_style
)
# Add tts to cascading pipeline
pipeline = CascadingPipeline(tts=tts)
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit speech_key
, speech_region
, and other credential parameters from your code.
Configuration Options
speech_key
: (Optional[str]) Azure Speech API key. UsesAZURE_SPEECH_KEY
environment variable if not provided.speech_region
: (Optional[str]) Azure Speech region (e.g.,"eastus"
,"westus2"
). UsesAZURE_SPEECH_REGION
environment variable if not provided.speech_endpoint
: (Optional[str]) Custom endpoint URL. UsesAZURE_SPEECH_ENDPOINT
environment variable if not provided.voice
: (str) Voice name to use for audio output (default:"en-US-EmmaNeural"
). Get available voices using the Azure voices API.language
: (str) Language code (optional, inferred from voice if not specified).tuning
: (VoiceTuning
) Voice tuning object for rate, volume, and pitch control:rate
: (str) Speaking rate ("x-slow"
,"slow"
,"medium"
,"fast"
,"x-fast"
or percentage like"50%"
)volume
: (str) Speaking volume ("silent"
,"x-soft"
,"soft"
,"medium"
,"loud"
,"x-loud"
or percentage)pitch
: (str) Voice pitch ("x-low"
,"low"
,"medium"
,"high"
,"x-high"
or frequency like"+50Hz"
)
style
: (SpeakingStyle
) Speaking style object for expressive speech:style
: (str) Speaking style (e.g.,"cheerful"
,"sad"
,"angry"
,"excited"
,"friendly"
)degree
: (float) Style intensity from 0.01 to 2.0 (default: 1.0)
deployment_id
: (str) Custom deployment ID for custom models.speech_auth_token
: (str) Authorization token for authentication.
Voice Selection
You can find available voices using the Azure Voices List API:
curl --location --request GET 'https://eastus2.tts.speech.microsoft.com/cognitiveservices/voices/list' \
--header 'Ocp-Apim-Subscription-Key: YOUR_SPEECH_KEY'
Popular voice options include:
en-US-EmmaNeural
(Female, neutral)en-US-BrianNeural
(Male, neutral)en-US-AriaNeural
(Female, cheerful)en-GB-SoniaNeural
(Female, British)
Additional Resources
The following resources provide more information about using Azure with VideoSDK Agents SDK.
- Azure Speech Service Overview: Complete overview of Azure Speech services.
- Azure TTS docs: Azure Text-to-Speech documentation.
- Voice Selection Guide: Guide for selecting synthesis language and voice.
- Speech Synthesis Markup: Learn about prosody adjustments and voice tuning.
Got a Question? Ask us on discord