Azure OpenAI STT
The Azure OpenAI STT provider enables your agent to use Azure OpenAI's speech-to-text models (like Whisper) for converting audio input to text.
Installation
Install the Azure OpenAI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-openai"
Authentication
The Azure OpenAI plugin requires either an Azure OpenAI API key.
Set AZURE_OPENAI_API_KEY
, AZURE_OPENAI_ENDPOINT
and OPENAI_API_VERSION
in your .env
file.
Importing
from videosdk.plugins.openai import OpenAISTT
Example Usage
from videosdk.plugins.openai import OpenAISTT
from videosdk.agents import CascadingPipeline
# Initialize the Azure OpenAI STT model
stt = OpenAISTT.azure(
azure_deployment="gpt-4o-transcribe",
language="en",
)
# Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options
azure_deployment
: The OpenAI deployment ID to use (by default it is model name: e.g.,"gpt-4o-mini-transcribe"
,"gpt-4o-transcribe"
)api_key
: Your Azure OpenAI API key (can also be set via environment variable)azure_endpoint
: Your Azure OpenAI Deployment Endpoint URL (can also be set via environment variable)api_version
: Your Azure OpenAI API version (can also be set via environment variable)language
: (str) Language code for transcription (default:"en"
)
Additional Resources
The following resources provide more information about using OpenAI with VideoSDK Agents SDK.
Got a Question? Ask us on discord