OpenAI STT

The OpenAI STT provider enables your agent to use OpenAI's speech-to-text models (like Whisper) for converting audio input to text.

Installation

Install the OpenAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-openai"

Authentication

The OpenAI plugin requires an OpenAI API key.

Set OPENAI_API_KEY in your .env file.

Importing

from videosdk.plugins.openai import OpenAISTT

Example Usage

from videosdk.plugins.openai import OpenAISTT
from videosdk.agents import CascadingPipeline

# Initialize the OpenAI STT model
stt = OpenAISTT(
    # When OPENAI_API_KEY is set in .env - DON'T pass api_key parameter
    api_key="your-openai-api-key",
    model="whisper-1",
    language="en",
    prompt="Transcribe this audio with proper punctuation and formatting."
)

#  Add stt to cascading pipeline
pipeline = CascadingPipeline(stt=stt)

note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options

api_key: Your OpenAI API key (required, can also be set via environment variable)
model: The OpenAI STT model to use (e.g., "whisper-1", "gpt-4o-mini-transcribe")
base_url: Custom base URL for OpenAI API (optional)
prompt: (str) Custom prompt to guide transcription style and format
language: (str) Language code for transcription (default: "en")
turn_detection: (dict) Configuration for detecting conversation turns

Additional Resources

The following resources provide more information about using OpenAI with VideoSDK Agents SDK.

Python package: The videosdk-plugins-openai package on PyPI.
GitHub repo: View the source or contribute to the VideoSDK OpenAI STT plugin.
OpenAI docs: OpenAI STT API documentation.

Got a Question? Ask us on discord

Installation​

Authentication​

Importing​

Example Usage​

Configuration Options​

Additional Resources​