Skip to main content
Version: 1.0.x

Azure Voice Live API (Beta)

The Azure Voice Live API provider enables your agent to use Microsoft's comprehensive speech-to-speech solution for low-latency, high-quality voice interactions. This unified API eliminates the need to manually orchestrate multiple components by integrating speech recognition, generative AI, and text-to-speech into a single interface.

Preview Feature

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Installation

Install the Azure-enabled VideoSDK Agents package:

pip install "videosdk-plugins-azure"

Authentication

The Azure Voice Live plugin requires an Azure AI Services resource with Cognitive Services endpoint.

Setup Steps:

  1. Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
  2. Get the AI Services resource endpoint and primary key. After your resource is deployed, select "Go to resource" to view and manage keys

Set AZURE_VOICE_LIVE_ENDPOINT and AZURE_VOICE_LIVE_API_KEY in your .env file:

AZURE_VOICE_LIVE_ENDPOINT=your-azure-ai-service-endpoint
AZURE_VOICE_LIVE_API_KEY=your-azure-ai-service-primary-key

Importing

from videosdk.agents.plugins import AzureVoiceLive, AzureVoiceLiveConfig
from videosdk.agents import Pipeline

Example Usage

from videosdk.agents.plugins import AzureVoiceLive, AzureVoiceLiveConfig
from videosdk.agents import Pipeline

# Configure the Voice Live API settings
config = AzureVoiceLiveConfig(
voice="en-US-EmmaNeural", # Azure neural voice
temperature=0.7,
turn_detection_threshold=0.5,
turn_detection_silence_duration_ms=500
)

# Initialize the Azure Voice Live model
model = AzureVoiceLive(
# When environment variables are set in .env - DON'T pass credentials
# api_key="your-azure-speech-key",
model="gpt-4o-realtime-preview",
config=config
)

# Create the pipeline with the model
pipeline = Pipeline(llm=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, speech_region, and other credential parameters from your code.

note

To initiate a conversation with Azure Voice Live, the user must speak first. The model listens for user input to begin the interaction.

Configuration Options

  • model: The Voice Live model to use (e.g., "gpt-4o-realtime-preview", "gpt-4o-mini-realtime-preview")
  • api_key: Your Azure Voice Live API key (can also be set via the AZURE_VOICE_LIVE_API_KEY environment variable)
  • endpoint: Your Azure Voice Live endpoint (can also be set via the AZURE_VOICE_LIVE_ENDPOINT environment variable)
  • credential: Azure credential object (AzureKeyCredential or TokenCredential) for authentication (alternative to API key; takes precedence when provided)
  • config: An AzureVoiceLiveConfig object for advanced options:
    • voice: (str) The voice to use for audio output. Can be an Azure neural voice (e.g., "en-US-AvaNeural") or an OpenAI voice (e.g., "alloy", "echo") (default: "en-US-AvaNeural")
    • modalities: (List[Modality]) List of enabled response types (default: [Modality.TEXT, Modality.AUDIO]).
    • input_audio_format: (AudioFormat) Audio format for input (default: AudioFormat.PCM16).
    • output_audio_format: (AudioFormat) Audio format for output (default: AudioFormat.PCM16).
    • turn_detection_threshold: (float) Voice activity detection threshold from 0.0 to 1.0 (default: 0.5).
    • turn_detection_prefix_padding_ms: (int) Padding before speech start in milliseconds (default: 300).
    • turn_detection_silence_duration_ms: (int) Silence duration to mark end of turn in milliseconds (default: 500).
    • temperature: (float or None) Sampling temperature for response randomness (default: None).
    • max_completion_tokens: (int or None) Maximum number of tokens in the response (default: None).

See it in Action

Explore a complete, end-to-end implementation of an agent using this provider in our AI Agent Quickstart Guide.

Additional Resources

The following resources provide more information about using Azure Voice Live with VideoSDK Agents SDK.

Got a Question? Ask us on discord