Skip to main content

OpenAI

The OpenAI provider enables your agent to use OpenAI's real-time models (like GPT-4o) for text and audio interactions.

Installation​

Install the OpenAI-enabled VideoSDK Agents package:

pip install "videosdk-plugins-openai"

Importing​

from videosdk.plugins.openai import OpenAIRealtime, OpenAIRealtimeConfig

Example Usage​

from videosdk.plugins.openai import OpenAIRealtime, OpenAIRealtimeConfig
from videosdk.agents import RealTimePipeline
from openai.types.beta.realtime.session import TurnDetection

# Initialize the OpenAI real-time model
model = OpenAIRealtime(
model="gpt-4o-realtime-preview",
# When OPENAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-openai-api-key",
config=OpenAIRealtimeConfig(
voice="alloy", # alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse
modalities=["text", "audio"],
turn_detection=TurnDetection(
type="server_vad",
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=200,
),
tool_choice="auto"
)
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

Configuration Options​

  • model: The OpenAI model to use (e.g., "gpt-4o-realtime-preview")
  • api_key: Your OpenAI API key (can also be set via environment variable)
  • config: An OpenAIRealtimeConfig object for advanced options:
    • voice: (str) The voice to use for audio output (e.g., "alloy").
    • temperature: (float) Sampling temperature for response randomness.
    • turn_detection: (TurnDetection or None) Configure how the agent detects when a user has finished speaking.
    • input_audio_transcription: (InputAudioTranscription or None) Configure audio-to-text (e.g., Whisper).
    • tool_choice: (str or None) Tool selection mode (e.g., "auto").
    • modalities: (list[str]) List of enabled modalities (e.g., ["text", "audio"]).
tip

Explore and utilize ready-made scripts for OpenAI with the VideoSDK AI Agent SDK. OpenAI Example Script.

Got a Question? Ask us on discord