Version: 1.0.x

Azure STT

The Azure STT provider enables your agent to use Microsoft Azure's advanced speech-to-text models for high-accuracy, real-time audio transcription with support for multiple languages and custom phrase lists.

Installation

Install the Azure-enabled VideoSDK Agents package:

pip install "videosdk-plugins-azure"

Importing

from videosdk.agents.plugins import AzureSTT

Authentication

The Azure STT plugin requires an Azure AI Speech Service resource.

Setup Steps:

Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
Get the Speech resource key and region. After your Speech resource is deployed, select "Go to resource" to view and manage keys

Set AZURE_SPEECH_KEY and AZURE_SPEECH_REGION in your .env file:

AZURE_SPEECH_KEY=your-azure-speech-key
AZURE_SPEECH_REGION=your-azure-region

Example Usage

from videosdk.agents.plugins import AzureSTT
from videosdk.agents import Pipeline

# Initialize the Azure STT model
stt = AzureSTT(
    language="en-US",
    sample_rate=16000,
    enable_phrase_list=True,
    phrase_list=["VideoSDK", "artificial intelligence", "machine learning"]
)

# Add stt to cascade
pipeline = Pipeline(stt=stt)

note

When using environment variables for credentials, don't pass the speech_key and speech_region as arguments to the model instance. The SDK automatically reads the environment variables.

Configuration Options

speech_key: (Optional[str]) Azure Speech API key. Uses AZURE_SPEECH_KEY environment variable if not provided.
speech_region: (Optional[str]) Azure Speech region (e.g., "eastus", "westus2"). Uses AZURE_SPEECH_REGION environment variable if not provided.
language: (str) The language code for transcription (default: "en-US"). See supported languages.
sample_rate: (int) The target audio sample rate in Hz for transcription (default: 16000). The input audio at 48000Hz will be resampled to this rate.
enable_phrase_list: (bool) Whether to enable phrase list for better recognition accuracy (default: False).
phrase_list: (Optional[List[str]]) List of phrases to boost recognition for domain-specific terms (default: None).

Additional Resources

The following resources provide more information about using Azure with VideoSDK Agents SDK.

Azure Speech Service Overview: Complete overview of Azure Speech services.
Azure STT docs: Azure Speech-to-Text documentation.
Getting Started Guide: Azure STT setup and prerequisites.

SDK Reference

GitHub Repository

Python Package

Got a Question? Ask us on discord

Installation​

Importing​

Authentication​

Example Usage​

Configuration Options​

Additional Resources​