OpenAI

The OpenAI provider enables your agent to use OpenAI's real-time models (like GPT-4o) for text and audio interactions.

Installation

Install the OpenAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-openai"

Importing

from videosdk.plugins.openai import OpenAIRealtime, OpenAIRealtimeConfig

Example Usage

from videosdk.plugins.openai import OpenAIRealtime, OpenAIRealtimeConfig
from videosdk.agents import RealTimePipeline
from openai.types.beta.realtime.session import  TurnDetection

# Initialize the OpenAI real-time model
model = OpenAIRealtime(
    model="gpt-4o-realtime-preview",
    # When OPENAI_API_KEY is set in .env - DON'T pass api_key parameter
    api_key="your-openai-api-key",
    config=OpenAIRealtimeConfig(
        voice="alloy", # alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse
        modalities=["text", "audio"],
        turn_detection=TurnDetection(
            type="server_vad",
            threshold=0.5,
            prefix_padding_ms=300,
            silence_duration_ms=200,
        ),
        tool_choice="auto"
    )
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)

note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options

model: The OpenAI model to use (e.g., "gpt-4o-realtime-preview")
api_key: Your OpenAI API key (can also be set via environment variable)
config: An OpenAIRealtimeConfig object for advanced options:
- voice: (str) The voice to use for audio output (e.g., "alloy").
- temperature: (float) Sampling temperature for response randomness.
- turn_detection: (TurnDetection or None) Configure how the agent detects when a user has finished speaking.
- input_audio_transcription: (InputAudioTranscription or None) Configure audio-to-text (e.g., Whisper).
- tool_choice: (str or None) Tool selection mode (e.g., "auto").
- modalities: (list[str]) List of enabled modalities (e.g., ["text", "audio"]).

tip

Explore and utilize ready-made scripts for OpenAI with the VideoSDK AI Agent SDK. OpenAI Example Script.

Got a Question? Ask us on discord

Installation​

Importing​

Example Usage​

Configuration Options​

Installation

Importing

Example Usage

Configuration Options