Skip to main content
Version: 1.0.x

Ultravox

The Ultravox provider enables your agent to use Ultravox's models for real-time, conversational AI interactions.

Installation

Install the Ultravox-enabled VideoSDK Agents package:

pip install "videosdk-plugins-ultravox"

Authentication

The Ultravox plugin requires an Ultravox API key.

Set the ULTRAVOX_API_KEY in your .env file.

Importing

from videosdk.agents.plugins import UltravoxRealtime, UltravoxLiveConfig

Example Usage

from videosdk.agents.plugins import UltravoxRealtime, UltravoxLiveConfig
from videosdk.agents import Pipeline

# Initialize the Ultravox real-time model
model = UltravoxRealtime(
model="fixie-ai/ultravox",
# When ULTRAVOX_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-ultravox-api-key",
config=UltravoxLiveConfig(
voice="54ebeae1-88df-4d66-af13-6c41283b4332"
)
)

# Create the pipeline with the model
pipeline = Pipeline(llm=model)
note

When using a .env file for credentials, you do not need to pass the api_key as an argument to the model instance; the SDK reads it automatically.

Key Features

  • Real-time Interactions: Utilize Ultravox's powerful models for low-latency voice conversations.
  • Function Calling: Empower your agent to perform actions like retrieving weather data or calling external APIs.
  • Custom Agent Behaviors: Define a unique personality and interaction style for your agent through system prompts.
  • Call Control: Agents can manage the conversation flow and gracefully terminate calls.
  • MCP Integration: Connect to external tools and data sources using the Model Context Protocol (MCP) via MCPServerStdio for local processes or MCPServerHTTP for remote services.

Configuration Options

  • model: The Ultravox model to use (e.g., "fixie-ai/ultravox").
  • api_key: Your Ultravox API key (can also be set via the ULTRAVOX_API_KEY environment variable).
  • config: An UltravoxLiveConfig object for advanced options:
    • voice: (str or None) The Voice ID for the synthesized speech.
    • language_hint: (str or None) A hint for the conversation's language (e.g., "en") (default: "en").
    • temperature: (float or None) Controls the randomness of responses (0.0 to 1.0).
    • max_duration: (str or None) Maximum duration of the call (e.g., "600s").
    • time_exceeded_message: (str or None) Message spoken when the maximum duration is exceeded.
    • input_sample_rate: (int) Sample rate for input audio in Hz (default: 48000).
    • output_sample_rate: (int) Sample rate for output audio in Hz (default: 24000).
    • client_buffer_size_ms: (int) Client-side audio buffer size in milliseconds (default: 30000).
    • vad_turn_endpoint_delay: (int or None) Delay in milliseconds for voice activity detection to determine the end of a turn (default: 800).
    • vad_minimum_turn_duration: (int or None) The minimum duration in milliseconds for a valid speech turn (default: 600).
    • vad_minimum_interruption_duration: (int or None) The minimum duration in milliseconds of speech required to interrupt the agent.
    • vad_frame_activation_threshold: (float or None) Frame activation threshold for voice activity detection (default: 0.4).
    • first_speaker: (str or None) Determines who speaks first (default: "FIRST_SPEAKER_USER").
    • enable_greeting_prompt: (bool) Whether to enable an initial greeting prompt (default: False).

Additional Resources

The following resources provide more information about using Ultravox with the VideoSDK Agents SDK.

Got a Question? Ask us on discord