Skip to main content

Azure STT

The Azure STT provider enables your agent to use Microsoft Azure's advanced speech-to-text models for high-accuracy, real-time audio transcription with support for multiple languages and custom phrase lists.

Installation

Install the Azure-enabled VideoSDK Agents package:

pip install "videosdk-plugins-azure"

Importing

from videosdk.plugins.azure import AzureSTT

Authentication

The Azure STT plugin requires an Azure AI Speech Service resource.

Setup Steps:

  1. Create an AI Services resource for Speech in the Azure portal or from Azure AI Foundry
  2. Get the Speech resource key and region. After your Speech resource is deployed, select "Go to resource" to view and manage keys

Set AZURE_SPEECH_KEY and AZURE_SPEECH_REGION in your .env file:

AZURE_SPEECH_KEY=your-azure-speech-key
AZURE_SPEECH_REGION=your-azure-region

Example Usage

from videosdk.plugins.azure import AzureSTT
from videosdk.agents import CascadingPipeline

# Initialize the Azure STT model
stt = AzureSTT(
language="en-US",
sample_rate=16000,
enable_phrase_list=True,
phrase_list=["VideoSDK", "artificial intelligence", "machine learning"]
)

# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note

When using environment variables for credentials, don't pass the speech_key and speech_region as arguments to the model instance. The SDK automatically reads the environment variables.

Configuration Options

  • speech_key: (Optional[str]) Azure Speech API key. Uses AZURE_SPEECH_KEY environment variable if not provided.
  • speech_region: (Optional[str]) Azure Speech region (e.g., "eastus", "westus2"). Uses AZURE_SPEECH_REGION environment variable if not provided.
  • language: (str) The language code for transcription (default: "en-US"). See supported languages.
  • sample_rate: (int) The target audio sample rate in Hz for transcription (default: 16000). The input audio at 48000Hz will be resampled to this rate.
  • enable_phrase_list: (bool) Whether to enable phrase list for better recognition accuracy (default: False).
  • phrase_list: (Optional[List[str]]) List of phrases to boost recognition for domain-specific terms (default: None).

Additional Resources

The following resources provide more information about using Azure with VideoSDK Agents SDK.

Got a Question? Ask us on discord