AI Voice Agent Quick Start

This guide will help you integrate an AI-powered voice agent into your VideoSDK meetings.

Prerequisites

Before you begin, ensure you have:

A VideoSDK authentication token (generate from app.videosdk.live)
A VideoSDK meeting ID (you can generate one using the Create Room API)
An OpenAI API key (or Gemini API key if using Google's models)

Installation

First, install the VideoSDK AI Agent package using pip:

pip install videosdk-agents

Environment Setup

It's recommended to use environment variables for secure storage of API keys and tokens. Create a .env file in your project root:

.env
OPENAI_API_KEY=your_openai_api_key
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token
GOOGLE_API_KEY=your_google_api_key
GOOGLE_APPLICATION_CREDENTIALS=path_to_google_credentials_json

Generating a VideoSDK Meeting ID

Before your AI agent can join a meeting, you'll need to create a meeting ID. You can generate one using the VideoSDK Create Room API:

Using cURL

curl -X POST https://api.videosdk.live/v2/rooms \
  -H "Authorization: YOUR_JWT_TOKEN_HERE" \
  -H "Content-Type: application/json"

For more details on the Create Room API, refer to the VideoSDK documentation.

1. Creating a Custom Agent

First, let's create a custom voice agent by inheriting from the base Agent class:

from videosdk.agents import Agent, AgentState, function_tool

class VoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant that can answer questions and help with tasks.",
        )
        # Register any tools the agent can use
        self.register_tools([self.get_weather, self.get_horoscope])

    async def on_enter(self) -> None:
        """Called when the agent first joins the meeting"""
        await self.session.say("Hi there! How can I help you today?")

This code defines a basic voice agent with:

Custom instructions that define the agent's personality and capabilities
An entry message when joining a meeting
State change handling to track the agent's current activity

2. Implementing Function Tools

Function tools allow your agent to perform actions beyond conversation. Let's add two example tools:

import aiohttp

class VoiceAgent(Agent):
    # ... previous code ...

    @function_tool
    async def get_weather(self, latitude: str, longitude: str):
        """Called when the user asks about the weather. This function will return the weather for
        the given location. When given a location, please estimate the latitude and longitude of the
        location and do not ask the user for them.

        Args:
            latitude: The latitude of the location
            longitude: The longitude of the location
        """
        print(f"Getting weather for {latitude}, {longitude}")
        url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m"

        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                if response.status == 200:
                    data = await response.json()
                    return {
                        "temperature": data["current"]["temperature_2m"],
                        "temperature_unit": "Celsius",
                    }
                else:
                    raise Exception(
                        f"Failed to get weather data, status code: {response.status}"
                    )

    @function_tool
    async def get_horoscope(self, sign: str) -> dict:
        """Get today's horoscope for a given zodiac sign.

        Args:
            sign: The zodiac sign (e.g., Aries, Taurus, Gemini, etc.)
        """
        horoscopes = {
            "Aries": "Today is your lucky day!",
            "Taurus": "Focus on your goals today.",
            "Gemini": "Communication will be important today.",
        }
        return {
            "sign": sign,
            "horoscope": horoscopes.get(sign, "The stars are aligned for you today!"),
        }

Each function tool:

Is decorated with @function_tool
Includes detailed docstrings that help the AI understand when and how to use the tool
Accepts parameters from the AI and returns structured data

3. Setting Up the Pipeline

The pipeline connects your agent to an AI model. In this example, we're using OpenAI's real-time model:

from videosdk.plugins.openai import OpenAIRealtime, OpenAIRealtimeConfig
from videosdk.agents import RealTimePipeline
from openai.types.beta.realtime.session import InputAudioTranscription, TurnDetection

async def start_session(jobctx):
    # Initialize the AI model
    model = OpenAIRealtime(
        model="gpt-4o-realtime-preview",
        config=OpenAIRealtimeConfig(
            modalities=["text", "audio"],
            input_audio_transcription=InputAudioTranscription(
                model="whisper-1"
            ),
            turn_detection=TurnDetection(
                type="server_vad",
                threshold=0.5,
                prefix_padding_ms=300,
                silence_duration_ms=200,
            ),
            tool_choice="auto"
        )
    )

    pipeline = RealTimePipeline(model=model)

    # Continue to the next steps...

4. Assembling and Starting the Agent Session

Now, let's put everything together and start the agent session:

import asyncio
from videosdk.agents import AgentSession, WorkerJob

async def start_session(jobctx):
    # ... previous setup code ...

    # Create the agent session
    session = AgentSession(
        agent=VoiceAgent(),
        pipeline=pipeline,
        context=jobctx
    )

    try:
        # Start the session
        await session.start()
        # Keep the session running until manually terminated
        await asyncio.Event().wait()
    finally:
        # Clean up resources when done
        await session.close()

def entryPoint(jobctx):
    asyncio.run(start_session(jobctx))

if __name__ == "__main__":
    def make_context():
        return {
            "meetingId": "<Your-Meeting-ID>",  # Use the generated meeting ID from earlier
            "name": "AI Assistant",   # Name displayed in the meeting
        }

    # Create and start the worker job
    job = WorkerJob(job_func=entryPoint, jobctx=make_context)
    job.start()

5. Connecting with VideoSDK Client Applications

After setting up your AI Agent, you'll need a client application to connect with it. You can use any of the VideoSDK quickstart examples to create a client that joins the same meeting:

When setting up your client application, make sure to use the same meeting ID that your AI Agent is using.

Got a Question? Ask us on discord

Prerequisites​

Installation​

Environment Setup​

Generating a VideoSDK Meeting ID​

Using cURL​

1. Creating a Custom Agent​

2. Implementing Function Tools​

3. Setting Up the Pipeline​

4. Assembling and Starting the Agent Session​

5. Connecting with VideoSDK Client Applications​