Skip to main content
Version: 1.0.x

Azure OpenAI STT

The Azure OpenAI STT provider enables your agent to use Azure OpenAI's speech-to-text models (like Whisper) for converting audio input to text.

Installation

Install the Azure OpenAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-openai"

Authentication

The Azure OpenAI plugin requires either an Azure OpenAI API key.

Set AZURE_OPENAI_API_KEY , AZURE_OPENAI_ENDPOINT and OPENAI_API_VERSION in your .env file.

Importing

from videosdk.agents.plugins import OpenAISTT

Example Usage

from videosdk.agents.plugins import OpenAISTT
from videosdk.agents import Pipeline

# Initialize the Azure OpenAI STT model
stt = OpenAISTT.azure(
azure_deployment="gpt-4o-transcribe",
language="en",
)

# Add stt to pipeline
pipeline = Pipeline(stt=stt)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options

  • model: (str) The model to use for the STT plugin (default: "gpt-4o-mini-transcribe")
  • language: (str) Language code for transcription (default: "en")
  • prompt: (str, optional) The prompt for the STT plugin (default: None)
  • turn_detection: (dict, optional) The turn detection configuration for the STT plugin (default: None)
  • azure_endpoint: (str, optional) Your Azure OpenAI Deployment Endpoint URL. Uses the AZURE_OPENAI_ENDPOINT environment variable if not provided.
  • azure_deployment: (str, optional) The OpenAI deployment ID to use. Uses the AZURE_OPENAI_DEPLOYMENT environment variable if not provided; if still unset, the model name is used as the deployment name.
  • api_version: (str, optional) Your Azure OpenAI API version. Uses the OPENAI_API_VERSION environment variable if not provided.
  • api_key: (str, optional) Your Azure OpenAI API key. Uses the AZURE_OPENAI_API_KEY environment variable if not provided.
  • azure_ad_token: (str, optional) Azure Active Directory token. Uses the AZURE_OPENAI_AD_TOKEN environment variable if not provided.
  • organization: (str, optional) The OpenAI organization ID. Uses the OPENAI_ORG_ID environment variable if not provided.
  • project: (str, optional) The OpenAI project ID. Uses the OPENAI_PROJECT_ID environment variable if not provided.
  • base_url: (str, optional) The base URL for the Azure OpenAI API (default: None)
  • enable_streaming: (bool) Whether to enable streaming transcription (default: False)
  • timeout: (httpx.Timeout, optional) Request timeout configuration. Defaults to a timeout of connect=15.0, read=5.0, write=5.0, pool=5.0 if not provided.

Additional Resources

The following resources provide more information about using OpenAI with VideoSDK Agents SDK.

Got a Question? Ask us on discord