Skip to main content

Azure Voice Live API (Beta)

The Azure Voice Live API provider enables your agent to use Microsoft's comprehensive speech-to-speech solution for low-latency, high-quality voice interactions. This unified API eliminates the need to manually orchestrate multiple components by integrating speech recognition, generative AI, and text-to-speech into a single interface.

Preview Feature

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Installation

Install the Azure-enabled VideoSDK Agents package:

pip install "videosdk-plugins-azure"

Authentication

The Azure Voice Live plugin requires an Azure AI Services resource with Cognitive Services endpoint.

Setup Steps:

  1. Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
  2. Get the AI Services resource endpoint and primary key. After your resource is deployed, select "Go to resource" to view and manage keys

Set AZURE_VOICE_LIVE_ENDPOINT and AZURE_VOICE_LIVE_API_KEY in your .env file:

AZURE_VOICE_LIVE_ENDPOINT=your-azure-ai-service-endpoint
AZURE_VOICE_LIVE_API_KEY=your-azure-ai-service-primary-key

Importing

from videosdk.plugins.azure import AzureVoiceLive, AzureVoiceLiveConfig
from videosdk.agents import RealTimePipeline

Example Usage

from videosdk.plugins.azure import AzureVoiceLive, AzureVoiceLiveConfig
from videosdk.agents import RealTimePipeline

# Configure the Voice Live API settings
config = AzureVoiceLiveConfig(
voice="en-US-EmmaNeural", # Azure neural voice
temperature=0.7,
turn_detection_timeout=1000,
enable_interruption=True
)

# Initialize the Azure Voice Live model
model = AzureVoiceLive(
# When environment variables are set in .env - DON'T pass credentials
# api_key="your-azure-speech-key",
model="gpt-4o-realtime-preview",
config=config
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, speech_region, and other credential parameters from your code.

note

To initiate a conversation with Azure Voice Live, the user must speak first. The model listens for user input to begin the interaction.

Configuration Options

  • model: The Voice Live model to use (e.g., "gpt-4o-realtime-preview", "gpt-4o-mini-realtime-preview")
  • api_key: Your Azure Speech API key (can also be set via environment variable)
  • speech_region: Your Azure Speech region (can also be set via environment variable)
  • credential: Azure DefaultAzureCredential for authentication (alternative to API key)
  • config: An AzureVoiceLiveConfig object for advanced options:
    • voice: (str) The Azure neural voice to use (e.g., "en-US-EmmaNeural", "hi-IN-AnanyaNeural")
    • temperature: (float) Sampling temperature for response randomness (default: 0.7)
    • turn_detection_timeout: (int) Timeout for turn detection in milliseconds
    • enable_interruption: (bool) Allow users to interrupt the agent during speech
    • noise_suppression: (bool) Enable noise suppression for clearer audio
    • echo_cancellation: (bool) Enable echo cancellation
    • phrase_list: (List[str]) Custom phrases for improved recognition accuracy

See it in Action

Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.

Additional Resources

The following resources provide more information about using Azure Voice Live with VideoSDK Agents SDK.

Got a Question? Ask us on discord