Azure STT
The Azure STT provider enables your agent to use Microsoft Azure's advanced speech-to-text models for high-accuracy, real-time audio transcription with support for multiple languages and custom phrase lists.
Installation
Install the Azure-enabled VideoSDK Agents package:
pip install "videosdk-plugins-azure"
Importing
from videosdk.plugins.azure import AzureSTT
Authentication
The Azure STT plugin requires an Azure AI Speech Service resource.
Setup Steps:
- Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
- Get the Speech resource key and region. After your Speech resource is deployed, select "Go to resource" to view and manage keys
Set AZURE_SPEECH_KEY and AZURE_SPEECH_REGION in your .env file:
AZURE_SPEECH_KEY=your-azure-speech-key
AZURE_SPEECH_REGION=your-azure-region
Example Usage
from videosdk.plugins.azure import AzureSTT
from videosdk.agents import CascadingPipeline
# Initialize the Azure STT model
stt = AzureSTT(
language="en-US",
sample_rate=16000,
enable_phrase_list=True,
phrase_list=["VideoSDK", "artificial intelligence", "machine learning"]
)
# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note
When using environment variables for credentials, don't pass the speech_key and speech_region as arguments to the model instance. The SDK automatically reads the environment variables.
Configuration Options
speech_key: (Optional[str]) Azure Speech API key. UsesAZURE_SPEECH_KEYenvironment variable if not provided.speech_region: (Optional[str]) Azure Speech region (e.g.,"eastus","westus2"). UsesAZURE_SPEECH_REGIONenvironment variable if not provided.language: (str) The language code for transcription (default:"en-US"). See supported languages.sample_rate: (int) The target audio sample rate in Hz for transcription (default:16000). The input audio at 48000Hz will be resampled to this rate.enable_phrase_list: (bool) Whether to enable phrase list for better recognition accuracy (default:False).phrase_list: (Optional[List[str]]) List of phrases to boost recognition for domain-specific terms (default:None).
Additional Resources
The following resources provide more information about using Azure with VideoSDK Agents SDK.
- Azure Speech Service Overview: Complete overview of Azure Speech services.
- Azure STT docs: Azure Speech-to-Text documentation.
- Getting Started Guide: Azure STT setup and prerequisites.
Got a Question? Ask us on discord

