OpenAI STT
The OpenAI STT provider enables your agent to use OpenAI's speech-to-text models (like Whisper) for converting audio input to text.
Installation
Install the OpenAI-enabled VideoSDK Agents package:
pip install "videosdk-plugins-openai"
Authentication
The OpenAI plugin requires an OpenAI API key.
Set OPENAI_API_KEY in your .env file.
Importing
from videosdk.agents.plugins import OpenAISTT
Example Usage
from videosdk.agents.plugins import OpenAISTT
from videosdk.agents import Pipeline
# Initialize the OpenAI STT model
stt = OpenAISTT(
# When OPENAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-openai-api-key",
model="whisper-1",
language="en",
prompt="Transcribe this audio with proper punctuation and formatting."
)
# Add stt to pipeline
pipeline = Pipeline(stt=stt)
note
When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.
Configuration Options
api_key: Your OpenAI API key (required, can also be set via environment variable)model: The OpenAI STT model to use (default:"gpt-4o-mini-transcribe")base_url: Custom base URL for OpenAI API (optional)prompt: (str) Custom prompt to guide transcription style and formatlanguage: (str) Language code for transcription (default:"en")turn_detection: (dict) Configuration for detecting conversation turnsenable_streaming: (bool) Whether to use streaming transcription mode (default:True)silence_threshold: (float) Amplitude threshold below which audio is treated as silence, used by the custom VAD in non-streaming mode (default:0.01)silence_duration: (float) Duration of silence in seconds before an utterance is finalized in non-streaming mode (default:0.8)
Additional Resources
The following resources provide more information about using OpenAI with VideoSDK Agents SDK.
- OpenAI docs: OpenAI STT API documentation.
Got a Question? Ask us on discord

