Version: 1.0.x

AI Telephony Agent Quick Start

This guide will walk you through creating a fully functional AI telephony agent using VideoSDK Agent SDK. You'll learn how to run the agent locally, connect it to the global telephone network using SIP, and enable it to handle both inbound and outbound phone calls. By the end, you'll have a working AI assistant that you can talk to from any phone.

The Architecture

Before we dive in, let's look at the high-level architecture. A call from the phone network is directed by a SIP provider (like Twilio) to VideoSDK's telephony infrastructure. A Routing Rule then intelligently dispatches the call to your self-hosted AI agent, which processes the audio and responds in real-time.

Architecture: Connecting a Voice Agent to the Telephony Network

What You'll Build

We'll create a simple yet powerful project with the following structure:

├── main.py                 # The core logic for your AI voice agent
├── requirements.txt        # Python package dependencies
└── .env                    # Your secret credentials

Prerequisites

To get started, you'll need a few things:

Python 3.12+: Ensure you have a modern version of Python installed.
VideoSDK Account: Sign up for a free VideoSDK account to get your VIDEOSDK_TOKEN. This token is used to authenticate your agent and manage telephony settings.

Part 1: Build and Run the AI Agent Locally

First, let's get the AI agent running on your machine.

Step 1: Set Up Your Project

Set Up Your Project

Create a .env file to store your secret keys. Add your credentials:
- Realtime Mode
- Cascading Mode
.env
VIDEOSDK_AUTH_TOKEN="your_videosdk_token_here" GOOGLE_API_KEY="your_google_api_key_here"
API Keys - Get API keys: Google API Key ↗ & Create your VideoSDK Account ↗ and follow this guide to generate videosdk token
.env
VIDEOSDK_AUTH_TOKEN="your_videosdk_token_here" DEEPGRAM_API_KEY="your_deepgram_api_key_here" OPENAI_API_KEY="your_openai_api_key_here" ELEVENLABS_API_KEY="your_elevenlabs_api_key_here"
API Keys - Get API keys: Deepgram ↗, OpenAI ↗, ElevenLabs ↗ & Create your VideoSDK Account ↗ and follow this guide to generate videosdk token
Create a requirements.txt file and paste in the necessary dependencies:
- Realtime Mode
- Cascading Mode
requirements.txt
videosdk-agents videosdk-plugins-google python-dotenv
requirements.txt
videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector] python-dotenv
Latest Version: Check the latest videosdk-agents version on PyPI for the most recent release.

Finally, create the main.py file. This script defines your agent's personality and handles the connection to VideoSDK.

Realtime Mode
Cascading Mode

main.py
import asyncio
import traceback
from videosdk.agents import Agent, AgentSession, Pipeline, JobContext, RoomOptions, WorkerJob, Options
from videosdk.agents.plugins import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv
import os
import logging
logging.basicConfig(level=logging.INFO)

load_dotenv()

# Define the agent's behavior and personality
class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful AI assistant that answers phone calls. Keep your responses concise and friendly.",
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm your real-time assistant. How can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Goodbye! It was great talking with you!")

async def start_session(context: JobContext):
    # Configure the Gemini model for real-time voice
    model = GeminiRealtime(
        model="gemini-3.1-flash-live-preview",
        api_key=os.getenv("GOOGLE_API_KEY"),
        config=GeminiLiveConfig(
            voice="Leda",
            response_modalities=["AUDIO"]
        )
    )
    pipeline = Pipeline(llm=model)
    session = AgentSession(agent=MyVoiceAgent(), pipeline=pipeline)

    await session.start(wait_for_participant=True, run_until_shutdown=True)

def make_context() -> JobContext:
    room_options = RoomOptions()
    return JobContext(room_options=room_options)

if __name__ == "__main__":
    try:
        # Register the agent with a unique ID
        options = Options(
            agent_id="MyTelephonyAgent",  # CRITICAL: Unique identifier for routing
            register=True,               # REQUIRED: Register with VideoSDK for telephony
            max_processes=10,            # Concurrent calls to handle
            host="localhost",
            port=8081,
            )
        job = WorkerJob(entrypoint=start_session, jobctx=make_context, options=options)
        job.start()
    except Exception as e:
        traceback.print_exc()

main.py
import asyncio
import traceback
from videosdk.agents import Agent, AgentSession, Pipeline, JobContext, RoomOptions, WorkerJob, Options
from videosdk.agents.plugins import SileroVAD, TurnDetector, pre_download_model, DeepgramSTT, OpenAILLM, ElevenLabsTTS
from dotenv import load_dotenv
import os
import logging
logging.basicConfig(level=logging.INFO)

load_dotenv()

# Pre-downloading the Turn Detector model
pre_download_model()

# Define the agent's behavior and personality
class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful AI assistant that answers phone calls. Keep your responses concise and friendly.",
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm your AI telephony assistant. How can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Goodbye! It was great talking with you!")

async def start_session(context: JobContext):
    # Create agent and conversation flow
    agent = MyVoiceAgent()

    # Create pipeline
    pipeline = Pipeline(
        stt=DeepgramSTT(model="nova-2", language="en"),
        llm=OpenAILLM(model="gpt-4o"),
        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
        vad=SileroVAD(threshold=0.35),
        turn_detector=TurnDetector(threshold=0.8)
    )

    session = AgentSession(
        agent=agent,
        pipeline=pipeline
    )

    await session.start(wait_for_participant=True, run_until_shutdown=True)

def make_context() -> JobContext:
    room_options = RoomOptions()
    return JobContext(room_options=room_options)

if __name__ == "__main__":
    try:
        # Register the agent with a unique ID
        options = Options(
            agent_id="MyTelephonyAgent",  # CRITICAL: Unique identifier for routing
            register=True,               # REQUIRED: Register with VideoSDK for telephony
            max_processes=10,            # Concurrent calls to handle
            host="localhost",
            port=8081,
            )
        job = WorkerJob(entrypoint=start_session, jobctx=make_context, options=options)
        job.start()
    except Exception as e:
        traceback.print_exc()

Step 2: Set Up Your Environment and Install Dependencies

Set Up Your Environment and Install Dependencies

Create and activate a virtual environment to keep your project dependencies isolated.

# Create the virtual environment
python3 -m venv .venv

# Activate it (macOS/Linux)
source .venv/bin/activate
# On Windows, use: .venv\Scripts\activate

# Install the required packages
pip install -r requirements.txt

Step 3: Run the Agent

Run the Agent

Now, start your agent by running the Python script.

python main.py

Your terminal will show that the agent is running and has registered itself with VideoSDK using the ID MyTelephonyAgent. This ID is crucial for routing calls to it later.

Running AI Agent Locally

Important: Keep this terminal window open. Your agent must remain running to accept connections.

Part 2: Connect Your Agent to the Phone Network

With your agent running locally, it's time to connect it to the outside world. This involves setting up gateways and routing rules in your VideoSDK dashboard.

Step 1: Add SIP Configuration

Add SIP Configuration

Go to the VideoSDK Dashboard.
Click on Add Number.
Click on Configure SIP.
Give a name and add your phone number.

Step 2: Setup Inbound Gateway

Setup Inbound Gateway

Via Dashboard
Via API

Copy the Inbound URL from the VideoSDK dashboard.
Go to Twilio and create a new SIP Trunk.
Go to the Origination section, paste the Inbound URL there, and save it.

curl --request POST \
  --url https://api.videosdk.live/v2/sip/inbound-gateways \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "My Inbound Gateway",
    "numbers": ["+1234567890"]
  }'

API Reference: Create Inbound Gateway

Step 3: Setup Outbound Gateway

Setup Outbound Gateway

Via Dashboard
Via API

Go to the Termination section in Twilio, create a URI, and paste it into the outbound section in the VideoSDK dashboard.
Create a username and password in Twilio and add it to the VideoSDK outbound section.

curl --request POST \
  --url https://api.videosdk.live/v2/sip/outbound-gateways \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "My Outbound Gateway",
    "numbers": ["+12065551234"],
    "address": "sip.myprovider.com",
    "transport": "udp",
    "auth": {
      "username": "your-username",
      "password": "your-password"
    }
  }'

API Reference: Create Outbound Gateway

Step 4: Setup Routing Rules

Setup Routing Rules

Via Dashboard
Via API

Click on Configure rule then Create new routing rule.
Add Routing Rule Name.
Select API Key.
Add Call Direction (Inbound or Outbound).
Add Phone Number.
Add Room Type.
Add Agent ID.
Click Save.

curl --request POST \
  --url https://api.videosdk.live/v2/sip/routing-rules \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "gatewayId": "gateway_in_123456789",
    "name": "Support Line Rule",
    "numbers": ["+1234567890"],
    "dispatch": "agent",
    "agentType": "self_hosted",
    "agentId": "MyTelephonyAgent"
  }'

API Reference: Create Routing Rule

You have now successfully instructed VideoSDK to route all inbound calls from your phone number directly to your running Python agent.

Part 3: Time to Talk! Make and Receive Calls

Your setup is complete! Let's test it out.

Keep Your Agent Running

Make sure your AI agent is running locally before configuring the telephony settings. The agent must be active to receive incoming calls.

Making an Inbound Call

Using any phone, dial the SIP number you configured.
Your local Python agent will automatically answer.
You'll hear the greeting: "Hello! I'm your real-time assistant. How can I help you today?"
Start talking! The agent will listen and respond in real-time.

Making an Outbound Call

You can trigger an outbound call from your agent using a simple API request.

Use curl or any API client to make a POST request to the VideoSDK API. Replace $YOUR_TOKEN and the routingRuleId with your own.

curl -H 'Authorization: $YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "sipCallFrom" : "+14155550100",
    "sipCallTo" : "+14155550199",
    "routingRuleId" : "rr_2554md"
  }' \
  -X POST https://api.videosdk.live/v2/sip/call

This will command your MyTelephonyAgent to dial the specified number and start a conversation.

Geographic Optimization

For optimal performance, run your agent in the same geographic region as your SIP provider (e.g., US East for Twilio, US West for Telnyx, Europe for Plivo). This reduces latency and improves call quality.

Next Steps

Congratulations! You've built and deployed a sophisticated AI telephony agent. You've seen how to run it locally and connect it to the global phone network for both inbound and outbound communication.

Deploy Your Agent

Learn how to deploy your AI agent to production

Explore Telephony Docs

Comprehensive telephony documentation and guides

Provider Integrations

SIP provider setup guides (Twilio, Vonage, etc.)

Got a Question? Ask us on discord

The Architecture​

What You'll Build​

Prerequisites​

Part 1: Build and Run the AI Agent Locally​

Step 1: Set Up Your Project​

Step 2: Set Up Your Environment and Install Dependencies​

Step 3: Run the Agent​

Part 2: Connect Your Agent to the Phone Network​

Step 1: Add SIP Configuration​

Step 2: Setup Inbound Gateway​

Step 3: Setup Outbound Gateway​

Step 4: Setup Routing Rules​

Part 3: Time to Talk! Make and Receive Calls​

Making an Inbound Call​

Making an Outbound Call​

Next Steps​