Skip to main content

AI Voice Agent Quick Start

This guide will help you integrate an AI-powered voice agent into your VideoSDK meetings.

Prerequisites​

Before you begin, ensure you have:

  • A VideoSDK authentication token (generate from app.videosdk.live)
    • A VideoSDK meeting ID (you can generate one using the Create Room API or through the VideoSDK dashboard)
  • Python 3.12 or higher
  • API Key: An API key corresponding to your chosen model provider:
    • OpenAI API key (for OpenAI models)
    • Google Gemini API key (for Gemini's LiveAPI)
    • AWS credentials (aws_access_key_id and aws_secret_access_key) for Amazon Nova Sonic

Installation​

Create and activate a virtual environment with Python 3.12 or higher:

python3.12 -m venv venv
source venv/bin/activate

First, install the VideoSDK AI Agent package using pip:

pip install videosdk-agents

Now its time to install the plugin for your chosen AI model. Each plugin is tailored for seamless integration with the VideoSDK AI Agent SDK.

pip install "videosdk-plugins-openai"

Environment Setup​

It's recommended to use environment variables for secure storage of API keys, secret tokens, and authentication tokens. Create a .env file in your project root:

.env
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token
OPENAI_API_KEY=your_openai_api_key

Generating a VideoSDK Meeting ID​

Before your AI agent can join a meeting, you'll need to create a meeting ID. You can generate one using the VideoSDK Create Room API:

Using cURL​

curl -X POST https://api.videosdk.live/v2/rooms \
-H "Authorization: YOUR_JWT_TOKEN_HERE" \
-H "Content-Type: application/json"

For more details on the Create Room API, refer to the VideoSDK documentation.

1. Creating a Custom Agent​

First, let's create a custom voice agent by inheriting from the base Agent class:

main.py
from videosdk.agents import Agent, function_tool

# External Tool
# async def get_weather(self, latitude: str, longitude: str):

class VoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful voice assistant that can answer questions and help with tasks.",
tools=[get_weather] # You can register any external tool defined outside of this scope
)

async def on_enter(self) -> None:
"""Called when the agent first joins the meeting"""
await self.session.say("Hi there! How can I help you today?")

This code defines a basic voice agent with:

  • Custom instructions that define the agent's personality and capabilities
  • An entry message when joining a meeting
  • State change handling to track the agent's current activity

2. Implementing Function Tools​

Function tools allow your agent to perform actions beyond conversation. There are two ways to define tools:

  • External Tools: Defined as standalone functions outside the agent class and registered via the tools argument in the agent's constructor.
  • Internal Tools: Defined as methods inside the agent class and decorated with @function_tool.

Below is an example of both:

main.py
import aiohttp

# External Function Tools
@function_tool
def get_weather(latitude: str, longitude: str):
"""Called when the user asks about the weather. This function will return the weather for
the given location. When given a location, please estimate the latitude and longitude of the
location and do not ask the user for them.

Args:
latitude: The latitude of the location
longitude: The longitude of the location
"""
print(f"Getting weather for {latitude}, {longitude}")
url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m"

async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status == 200:
data = await response.json()
return {
"temperature": data["current"]["temperature_2m"],
"temperature_unit": "Celsius",
}
else:
raise Exception(
f"Failed to get weather data, status code: {response.status}"
)

class VoiceAgent(Agent):
# ... previous code ...
# Internal Function Tools
@function_tool
async def get_horoscope(self, sign: str) -> dict:
"""Get today's horoscope for a given zodiac sign.

Args:
sign: The zodiac sign (e.g., Aries, Taurus, Gemini, etc.)
"""
horoscopes = {
"Aries": "Today is your lucky day!",
"Taurus": "Focus on your goals today.",
"Gemini": "Communication will be important today.",
}
return {
"sign": sign,
"horoscope": horoscopes.get(sign, "The stars are aligned for you today!"),
}
  • Use external tools for reusable, standalone functions (registered via tools=[...]).
  • Use internal tools for agent-specific logic as class methods.
  • Both must be decorated with @function_tool for the agent to recognize and use them.

3. Setting Up the Pipeline​

The pipeline connects your agent to an AI model. In this example, we're using OpenAI's real-time model:

main.py
from videosdk.plugins.openai import OpenAIRealtime, OpenAIRealtimeConfig
from videosdk.agents import RealTimePipeline
from openai.types.beta.realtime.session import TurnDetection

async def start_session(context: dict):
# Initialize the AI model
model = OpenAIRealtime(
model="gpt-4o-realtime-preview",
# When OPENAI_API_KEY is set in .env - DON'T pass api_key parameter
api_key="sk-proj-XXXXXXXXXXXXXXXXXXXX",
config=OpenAIRealtimeConfig(
voice="alloy", # alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse
modalities=["text", "audio"],
turn_detection=TurnDetection(
type="server_vad",
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=200,
),
tool_choice="auto"
)
)

pipeline = RealTimePipeline(model=model)

# Continue to the next steps...
note

When using .env file for credentials, don't pass them as arguments to model instances or context objects. The SDK automatically reads environment variables, so omit api_key, videosdk_auth, and other credential parameters from your code.

4. Assembling and Starting the Agent Session​

Now, let's put everything together and start the agent session:

main.py
import asyncio
from videosdk.agents import AgentSession

async def start_session(context: dict):
# ... previous setup code ...

# Create the agent session
session = AgentSession(
agent=VoiceAgent(),
pipeline=pipeline,
context=context
)

try:
# Start the session
await session.start()
# Keep the session running until manually terminated
await asyncio.Event().wait()
finally:
# Clean up resources when done
await session.close()

if __name__ == "__main__":
def make_context():
# When VIDEOSDK_AUTH_TOKEN is set in .env - DON'T include videosdk_auth
return {
"meetingId": "your_actual_meeting_id_here", # Replace with actual meeting ID
"name": "AI Voice Agent",
"videosdk_auth": "your_videosdk_auth_token_here" # Replace with actual token
}

asyncio.run(start_session(context=make_context()))

5. Connecting with VideoSDK Client Applications​

After setting up your AI Agent, you'll need a client application to connect with it. You can use any of the VideoSDK quickstart examples to create a client that joins the same meeting:

When setting up your client application, make sure to use the same meeting ID that your AI Agent is using.

6. Running the Project​

Once you have completed the setup, you can run your AI Voice Agent project using Python. Make sure your .env file is properly configured and all dependencies are installed.

python main.py
tip

Get started quickly with the Quick Start Example for the VideoSDK AI Agent SDK — everything you need to build your first AI agent fast.

Got a Question? Ask us on discord