Skip to main content

AI Telephony Agent Quick Start

This guide will walk you through creating a fully functional AI telephony agent using VideoSDK Agent SDK. You'll learn how to run the agent locally, connect it to the global telephone network using SIP, and enable it to handle both inbound and outbound phone calls. By the end, you'll have a working AI assistant that you can talk to from any phone.

The Architecture

Before we dive in, let's look at the high-level architecture. A call from the phone network is directed by a SIP provider (like Twilio) to VideoSDK's telephony infrastructure. A Routing Rule then intelligently dispatches the call to your self-hosted AI agent, which processes the audio and responds in real-time.

Architecture: Connecting a Voice Agent to the Telephony Network

What You'll Build

We'll create a simple yet powerful project with the following structure:

├── main.py                 # The core logic for your AI voice agent
├── requirements.txt # Python package dependencies
└── .env # Your secret credentials

Prerequisites

To get started, you'll need a few things:

  • Python 3.12+: Ensure you have a modern version of Python installed.
  • VideoSDK Account: Sign up for a free VideoSDK account to get your VIDEOSDK_TOKEN. This token is used to authenticate your agent and manage telephony settings.

Part 1: Build and Run the AI Agent Locally

First, let's get the AI agent running on your machine.

Step 1: Set Up Your Project

  1. Create a .env file to store your secret keys. Add your credentials:

    .env
    VIDEOSDK_TOKEN="your_videosdk_token_here"
    GOOGLE_API_KEY="your_google_api_key_here"

    API Keys - Get API keys: Google API Key ↗ & Create your VideoSDK Account ↗ and follow this guide to generate videosdk token

  2. Create a requirements.txt file and paste in the necessary dependencies:

    requirements.txt
    videosdk-agents==0.0.32
    videosdk-plugins-google==0.0.32
    python-dotenv==1.1.1
    requests==2.31.0

    Latest Version: Check the latest videosdk-agents version on PyPI for the most recent release.

  3. Finally, create the main.py file. This script defines your agent's personality and handles the connection to VideoSDK.

    main.py
    import asyncio
    import traceback
    from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, Options
    from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
    from dotenv import load_dotenv
    import os
    import logging
    logging.basicConfig(level=logging.INFO)

    load_dotenv()

    # Define the agent's behavior and personality
    class MyVoiceAgent(Agent):
    def __init__(self):
    super().__init__(
    instructions="You are a helpful AI assistant that answers phone calls. Keep your responses concise and friendly.",
    )

    async def on_enter(self) -> None:
    await self.session.say("Hello! I'm your real-time assistant. How can I help you today?")

    async def on_exit(self) -> None:
    await self.session.say("Goodbye! It was great talking with you!")

    async def start_session(context: JobContext):
    # Configure the Gemini model for real-time voice
    model = GeminiRealtime(
    api_key=os.getenv("GOOGLE_API_KEY"),
    config=GeminiLiveConfig(
    voice="Leda",
    response_modalities=["AUDIO"]
    )
    )
    pipeline = RealTimePipeline(model=model)
    session = AgentSession(agent=MyVoiceAgent(), pipeline=pipeline)

    try:
    await context.connect()
    await session.start()
    await asyncio.Event().wait()
    finally:
    await session.close()
    await context.shutdown()

    if __name__ == "__main__":
    try:
    # Register the agent with a unique ID
    options = Options(
    agent_id="MyTelephonyAgent", # CRITICAL: Unique identifier for routing
    register=True, # REQUIRED: Register with VideoSDK for telephony
    max_processes=10, # Concurrent calls to handle
    host="localhost",
    port=8081,
    )
    job = WorkerJob(entrypoint=start_session, options=options)
    job.start()
    except Exception as e:
    traceback.print_exc()

Step 2: Set Up Your Environment and Install Dependencies

Create and activate a virtual environment to keep your project dependencies isolated.

# Create the virtual environment
python3 -m venv .venv

# Activate it (macOS/Linux)
source .venv/bin/activate
# On Windows, use: .venv\Scripts\activate

# Install the required packages
pip install -r requirements.txt

Step 3: Run the Agent

Now, start your agent by running the Python script.

python main.py

Your terminal will show that the agent is running and has registered itself with VideoSDK using the ID MyTelephonyAgent. This ID is crucial for routing calls to it later.

Running AI Agent Locally

Important: Keep this terminal window open. Your agent must remain running to accept connections.

Part 2: Connect Your Agent to the Phone Network

With your agent running locally, it's time to connect it to the outside world. This involves setting up gateways and routing rules in your VideoSDK dashboard.

Step 1: Configure an Inbound Gateway

An Inbound Gateway is the entry point for calls coming into VideoSDK.

  • Navigate to Telephony > Inbound Gateways in the VideoSDK Dashboard and click Add.
  • Give your gateway a name and enter the phone number you purchased from your SIP provider (e.g., Twilio, Vonage, Telnyx, Plivo, Exotel).
  • After creating it, copy the Inbound Gateway URL.
  • In your SIP provider's dashboard, paste this URL into the Origination SIP URI field for your phone number. This tells your provider to forward all incoming calls to VideoSDK.

Step 2: Configure an Outbound Gateway

An Outbound Gateway is the exit point for calls your agent makes to the phone network.

  • Go to Telephony > Outbound Gateways in the dashboard and click Add.
  • Give it a name and paste the Termination SIP URI and credentials from your SIP provider into the required fields.

Step 3: Create a Routing Rule

A Routing Rule acts as a switchboard, connecting your gateways to your agent. This is where the magic happens.

  • Go to Telephony > Routing Rules and click Add.
  • Configure the rule:
    • Gateway: Select the Inbound Gateway you just created.
    • Numbers: Add the phone number associated with the gateway.
    • Dispatch: Choose Agent.
    • Agent Type: Set to Self Hosted.
    • Agent ID: Enter MyTelephonyAgent. This must match the agent_id in your main.py file.
  • Click Create to save the rule.

Setting Up Routing Rules for an AI Agent

You have now successfully instructed VideoSDK to route all inbound calls from your phone number directly to your running Python agent.

Part 3: Time to Talk! Make and Receive Calls

Your setup is complete! Let's test it out.

Keep Your Agent Running

Make sure your AI agent is running locally before configuring the telephony settings. The agent must be active to receive incoming calls.

Making an Inbound Call

  • Using any phone, dial the SIP number you configured.
  • Your local Python agent will automatically answer.
  • You'll hear the greeting: "Hello! I'm your real-time assistant. How can I help you today?"
  • Start talking! The agent will listen and respond in real-time.

Making an Outbound Call

You can trigger an outbound call from your agent using a simple API request.

Use curl or any API client to make a POST request to the VideoSDK API. Replace YOUR_VIDEOSDK_TOKEN and the gatewayId with your own.

curl --request POST \
--url https://api.videosdk.live/v2/sip/call \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"gatewayId": "gw_123456789",
"sipCallTo": "+14155550123"
}'

This will command your MyTelephonyAgent to dial the specified number and start a conversation.

Geographic Optimization

For optimal performance, run your agent in the same geographic region as your SIP provider (e.g., US East for Twilio, US West for Telnyx, Europe for Plivo). This reduces latency and improves call quality.

Next Steps

Congratulations! You've built and deployed a sophisticated AI telephony agent. You've seen how to run it locally and connect it to the global phone network for both inbound and outbound communication.

Got a Question? Ask us on discord