Azure Voice Live API (Beta)
The Azure Voice Live API provider enables your agent to use Microsoft's comprehensive speech-to-speech solution for low-latency, high-quality voice interactions. This unified API eliminates the need to manually orchestrate multiple components by integrating speech recognition, generative AI, and text-to-speech into a single interface.
This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Installation
Install the Azure-enabled VideoSDK Agents package:
pip install "videosdk-plugins-azure"
Authentication
The Azure Voice Live plugin requires an Azure AI Services resource with Cognitive Services endpoint.
Setup Steps:
- Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
- Get the AI Services resource endpoint and primary key. After your resource is deployed, select "Go to resource" to view and manage keys
Set AZURE_VOICE_LIVE_ENDPOINT
and AZURE_VOICE_LIVE_API_KEY
in your .env
file:
AZURE_VOICE_LIVE_ENDPOINT=your-azure-ai-service-endpoint
AZURE_VOICE_LIVE_API_KEY=your-azure-ai-service-primary-key
Importing
from videosdk.plugins.azure import AzureVoiceLive, AzureVoiceLiveConfig
from videosdk.agents import RealTimePipeline
Example Usage
from videosdk.plugins.azure import AzureVoiceLive, AzureVoiceLiveConfig
from videosdk.agents import RealTimePipeline
# Configure the Voice Live API settings
config = AzureVoiceLiveConfig(
voice="en-US-EmmaNeural", # Azure neural voice
temperature=0.7,
turn_detection_timeout=1000,
enable_interruption=True
)
# Initialize the Azure Voice Live model
model = AzureVoiceLive(
# When environment variables are set in .env - DON'T pass credentials
# api_key="your-azure-speech-key",
model="gpt-4o-realtime-preview",
config=config
)
# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key
, speech_region
, and other credential parameters from your code.
To initiate a conversation with Azure Voice Live, the user must speak first. The model listens for user input to begin the interaction.
Configuration Options
model
: The Voice Live model to use (e.g.,"gpt-4o-realtime-preview"
,"gpt-4o-mini-realtime-preview"
)api_key
: Your Azure Speech API key (can also be set via environment variable)speech_region
: Your Azure Speech region (can also be set via environment variable)credential
: Azure DefaultAzureCredential for authentication (alternative to API key)config
: AnAzureVoiceLiveConfig
object for advanced options:voice
: (str) The Azure neural voice to use (e.g.,"en-US-EmmaNeural"
,"hi-IN-AnanyaNeural"
)temperature
: (float) Sampling temperature for response randomness (default: 0.7)turn_detection_timeout
: (int) Timeout for turn detection in millisecondsenable_interruption
: (bool) Allow users to interrupt the agent during speechnoise_suppression
: (bool) Enable noise suppression for clearer audioecho_cancellation
: (bool) Enable echo cancellationphrase_list
: (List[str]) Custom phrases for improved recognition accuracy
See it in Action
Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.
Additional Resources
The following resources provide more information about using Azure Voice Live with VideoSDK Agents SDK.
- Azure Voice Live API Documentation: Complete Azure Voice Live API documentation.
- Azure Speech Service Overview: Overview of Azure Speech services.
Got a Question? Ask us on discord