Azure OpenAI STT
The Azure OpenAI STT provider enables your agent to use Azure OpenAI's speech-to-text models (like Whisper) for converting audio input to text.
Installation
Install the Azure OpenAI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-openai"
Authentication
The Azure OpenAI plugin requires either an Azure OpenAI API key.
Set AZURE_OPENAI_API_KEY , AZURE_OPENAI_ENDPOINT and OPENAI_API_VERSION in your .env file.
Importing
from videosdk.agents.plugins import OpenAISTT
Example Usage
from videosdk.agents.plugins import OpenAISTT
from videosdk.agents import Pipeline
# Initialize the Azure OpenAI STT model
stt = OpenAISTT.azure(
azure_deployment="gpt-4o-transcribe",
language="en",
)
# Add stt to pipeline
pipeline = Pipeline(stt=stt)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options
model: (str) The model to use for the STT plugin (default:"gpt-4o-mini-transcribe")language: (str) Language code for transcription (default:"en")prompt: (str, optional) The prompt for the STT plugin (default:None)turn_detection: (dict, optional) The turn detection configuration for the STT plugin (default:None)azure_endpoint: (str, optional) Your Azure OpenAI Deployment Endpoint URL. Uses theAZURE_OPENAI_ENDPOINTenvironment variable if not provided.azure_deployment: (str, optional) The OpenAI deployment ID to use. Uses theAZURE_OPENAI_DEPLOYMENTenvironment variable if not provided; if still unset, themodelname is used as the deployment name.api_version: (str, optional) Your Azure OpenAI API version. Uses theOPENAI_API_VERSIONenvironment variable if not provided.api_key: (str, optional) Your Azure OpenAI API key. Uses theAZURE_OPENAI_API_KEYenvironment variable if not provided.azure_ad_token: (str, optional) Azure Active Directory token. Uses theAZURE_OPENAI_AD_TOKENenvironment variable if not provided.organization: (str, optional) The OpenAI organization ID. Uses theOPENAI_ORG_IDenvironment variable if not provided.project: (str, optional) The OpenAI project ID. Uses theOPENAI_PROJECT_IDenvironment variable if not provided.base_url: (str, optional) The base URL for the Azure OpenAI API (default:None)enable_streaming: (bool) Whether to enable streaming transcription (default:False)timeout: (httpx.Timeout, optional) Request timeout configuration. Defaults to a timeout ofconnect=15.0, read=5.0, write=5.0, pool=5.0if not provided.
Additional Resources
The following resources provide more information about using OpenAI with VideoSDK Agents SDK.
Got a Question? Ask us on discord

