Ultravox
The Ultravox provider enables your agent to use Ultravox's models for real-time, conversational AI interactions.
Installation
Install the Ultravox-enabled VideoSDK Agents package:
pip install "videosdk-plugins-ultravox"
Authentication
The Ultravox plugin requires an Ultravox API key.
Set the ULTRAVOX_API_KEY in your .env file.
Importing
from videosdk.agents.plugins import UltravoxRealtime, UltravoxLiveConfig
Example Usage
from videosdk.agents.plugins import UltravoxRealtime, UltravoxLiveConfig
from videosdk.agents import Pipeline
# Initialize the Ultravox real-time model
model = UltravoxRealtime(
model="fixie-ai/ultravox",
# When ULTRAVOX_API_KEY is set in .env - DON'T pass api_key parameter
api_key="your-ultravox-api-key",
config=UltravoxLiveConfig(
voice="54ebeae1-88df-4d66-af13-6c41283b4332"
)
)
# Create the pipeline with the model
pipeline = Pipeline(llm=model)
note
When using a .env file for credentials, you do not need to pass the api_key as an argument to the model instance; the SDK reads it automatically.
Key Features
- Real-time Interactions: Utilize Ultravox's powerful models for low-latency voice conversations.
- Function Calling: Empower your agent to perform actions like retrieving weather data or calling external APIs.
- Custom Agent Behaviors: Define a unique personality and interaction style for your agent through system prompts.
- Call Control: Agents can manage the conversation flow and gracefully terminate calls.
- MCP Integration: Connect to external tools and data sources using the Model Context Protocol (MCP) via
MCPServerStdiofor local processes orMCPServerHTTPfor remote services.
Configuration Options
model: The Ultravox model to use (e.g.,"fixie-ai/ultravox").api_key: Your Ultravox API key (can also be set via theULTRAVOX_API_KEYenvironment variable).config: AnUltravoxLiveConfigobject for advanced options:voice: (str or None) The Voice ID for the synthesized speech.language_hint: (str or None) A hint for the conversation's language (e.g.,"en") (default:"en").temperature: (float or None) Controls the randomness of responses (0.0 to 1.0).max_duration: (str or None) Maximum duration of the call (e.g.,"600s").time_exceeded_message: (str or None) Message spoken when the maximum duration is exceeded.input_sample_rate: (int) Sample rate for input audio in Hz (default: 48000).output_sample_rate: (int) Sample rate for output audio in Hz (default: 24000).client_buffer_size_ms: (int) Client-side audio buffer size in milliseconds (default: 30000).vad_turn_endpoint_delay: (int or None) Delay in milliseconds for voice activity detection to determine the end of a turn (default: 800).vad_minimum_turn_duration: (int or None) The minimum duration in milliseconds for a valid speech turn (default: 600).vad_minimum_interruption_duration: (int or None) The minimum duration in milliseconds of speech required to interrupt the agent.vad_frame_activation_threshold: (float or None) Frame activation threshold for voice activity detection (default: 0.4).first_speaker: (str or None) Determines who speaks first (default:"FIRST_SPEAKER_USER").enable_greeting_prompt: (bool) Whether to enable an initial greeting prompt (default:False).
Additional Resources
The following resources provide more information about using Ultravox with the VideoSDK Agents SDK.
Got a Question? Ask us on discord

