Skip to main content

Ultravox

The Ultravox provider enables your agent to use Ultravox's models for real-time, conversational AI interactions.

Installation

Install the Ultravox-enabled VideoSDK Agents package:

pip install "videosdk-plugins-ultravox"

Authentication

The Ultravox plugin requires an Ultravox API key.

Set the ULTRAVOX_API_KEY in your .env file.

Importing

from videosdk.plugins.ultravox import UltravoxRealtime, UltravoxLiveConfig

Example Usage

from videosdk.plugins.ultravox import UltravoxRealtime, UltravoxLiveConfig
from videosdk.agents import RealTimePipeline

# Initialize the Ultravox real-time model
model = UltravoxRealtime(
model="fixie-ai/ultravox",
# When ULTRAVOX_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-ultravox-api-key",
config=UltravoxLiveConfig(
voice="54ebeae1-88df-4d66-af13-6c41283b4332"
)
)

# Create the pipeline with the model
pipeline = RealTimePipeline(model=model)
note

When using a .env file for credentials, you do not need to pass the api_key as an argument to the model instance; the SDK reads it automatically.

Key Features

  • Real-time Interactions: Utilize Ultravox's powerful models for low-latency voice conversations.
  • Function Calling: Empower your agent to perform actions like retrieving weather data or calling external APIs.
  • Custom Agent Behaviors: Define a unique personality and interaction style for your agent through system prompts.
  • Call Control: Agents can manage the conversation flow and gracefully terminate calls.
  • MCP Integration: Connect to external tools and data sources using the Model Context Protocol (MCP) via MCPServerStdio for local processes or MCPServerHTTP for remote services.

Configuration Options

  • model: The Ultravox model to use (e.g., "fixie-ai/ultravox").
  • api_key: Your Ultravox API key (can also be set via the ULTRAVOX_API_KEY environment variable).
  • config: An UltravoxLiveConfig object for advanced options:
    • voice: (str) The Voice ID for the synthesized speech.
    • language_hint: (str) A hint for the conversation's language (e.g., "en").
    • temperature: (float) Controls the randomness of responses (0.0 to 1.0).
    • vad_turn_endpoint_delay: (int) Delay in milliseconds for voice activity detection to determine the end of a turn.
    • vad_minimum_turn_duration: (int) The minimum duration in milliseconds for a valid speech turn.

Additional Resources

The following resources provide more information about using Ultravox with the VideoSDK Agents SDK.

Got a Question? Ask us on discord