WhatsApp Agent Quick Start

This quickstart guide will walk you through creating a powerful AI voice agent that can answer calls made to your WhatsApp Business number. We will achieve this using a direct SIP integration between the Meta Business Platform and VideoSDK, which simplifies the architecture and removes the need for a third-party telephony provider.

Architecture Overview - Call Flow

The diagram below illustrates the end-to-end call flow we are building. A call initiated by a WhatsApp User is received by the Meta Business Platform, which then forwards it directly via SIP to the VideoSDK SIP Gateway. From there, Routing Rules direct the call to our AI Agent.

Prerequisites for Meta Configuration

This guide assumes you have already completed the initial setup of your business presence on the Meta platform.

A Meta (Facebook) Business Manager Account that is verified.
A Phone number that has been added and verified in your WhatsApp Business Account (WABA).
A Meta Developer App with the whatsapp_business_management permission enabled.
A Permanent User Access Token for meta graph api endpoint.

Essential: Meta Graph API Setup

Integrating inbound/outbound WhatsApp calls requires updating your number's settings via the Meta Graph API. This guide covers the process in Part 3: Enable WhatsApp SIP Forwarding. For a deeper understanding of the API, refer to the official Meta Graph API overview.

Part 1: Build and Run Your Custom Voice Agent

First, we'll create the AI agent that will handle the conversation logic. This agent will run on your local machine for testing.

Step 1: Project Setup

Project Setup

Create a directory for your project and add the following files:

.env: To store your secret credentials.
requirements.txt: To list the Python dependencies.
main.py: The main script for your AI agent.

Step 2: Add Credentials and Dependencies

Add Credentials and Dependencies

In your .env file, add the necessary API keys.

Realtime Pipeline
Cascading Pipeline

.env

VIDEOSDK_AUTH_TOKEN="your_videosdk_token_here"
GOOGLE_API_KEY="your_google_api_key_here"

API Keys: Get your Google API Key and create a VideoSDK Account to generate your token .

.env
VIDEOSDK_AUTH_TOKEN="your_videosdk_token_here"
DEEPGRAM_API_KEY="your_deepgram_api_key_here"
OPENAI_API_KEY="your_openai_api_key_here"
ELEVENLABS_API_KEY="your_elevenlabs_api_key_here"

API Keys: Get keys from Deepgram, OpenAI, ElevenLabs, and VideoSDK Account to generate videosdk token .

In requirements.txt, add the dependencies.

Realtime Pipeline
Cascading Pipeline

requirements.txt
videosdk-agents
videosdk-plugins-google
python-dotenv

requirements.txt

videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector]
python-dotenv

Latest Version: Check the latest videosdk-agents version on PyPI for the most recent release.

Step 3: Create the Agent Logic

Create the Agent Logic

Paste the following code into main.py. This defines the agent's personality and sets it up to be discoverable by VideoSDK's telephony service.

Realtime Pipeline
Cascading Pipeline

main.py
import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, WorkerJob, Options
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
load_dotenv()

# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
        )
    async def on_enter(self) -> None:
        await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
    async def on_exit(self) -> None:
        await self.session.say("Thank you for calling. Goodbye!")

async def start_session(context: JobContext):

    model = GeminiRealtime(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        api_key=os.getenv("GOOGLE_API_KEY"),
        config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"])
    )

    pipeline = RealTimePipeline(model=model)
    session = AgentSession(agent=MyWhatsappAgent(), pipeline=pipeline)

    try:
        await context.connect()
        await session.start()
        await asyncio.Event().wait()
    finally:
        await session.close()
        await context.shutdown()

if __name__ == "__main__":
    try:
        options = Options(
            agent_id="agent1",  # CRITICAL: Unique ID for routing
            register=True,      # REQUIRED: Register with VideoSDK for telephony
            max_processes=10,
        )
        job = WorkerJob(entrypoint=start_session, options=options)
        job.start()
    except Exception as e:
        traceback.print_exc()

main.py
import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, WorkerJob, Options, ConversationFlow
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
load_dotenv()
pre_download_model()

# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
        )
    async def on_enter(self) -> None:
        await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
    async def on_exit(self) -> None:
        await self.session.say("Thank you for calling. Goodbye!")

async def start_session(context: JobContext):

    pipeline = CascadingPipeline(
        stt=DeepgramSTT(model="nova-2", language="en"),
        llm=OpenAILLM(model="gpt-4o"),
        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
        vad=SileroVAD(threshold=0.35),
        turn_detector=TurnDetector(threshold=0.8)
    )
    session = AgentSession(
        agent=MyWhatsappAgent(),
        pipeline=pipeline,
        conversation_flow=ConversationFlow(MyWhatsappAgent())
    )
    try:
        await context.connect()
        await session.start()
        await asyncio.Event().wait()
    finally:
        await session.close()
        await context.shutdown()

if __name__ == "__main__":
    try:
        options = Options(
            agent_id="agent1",  # CRITICAL: Unique ID for routing
            register=True,               # REQUIRED: Register with VideoSDK for telephony
            max_processes=10,
        )
        job = WorkerJob(entrypoint=start_session, options=options)
        job.start()
    except Exception as e:
        traceback.print_exc()

Step 4: Install Dependencies and Run the Agent

Install Dependencies and Run the Agent

CLI Commands
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install packages
pip install -r requirements.txt

# Run the agent
python main.py

Your agent is now running and waiting for connections. Keep the terminal open.

Part 2: Configure VideoSDK Gateways and Routing

Next, we need to tell VideoSDK how to handle incoming calls and where to send them.

Step 1: Configure an Inbound Gateway

Configure an Inbound Gateway

This is the entry point for calls coming from WhatsApp into VideoSDK.

Via Dashboard
Via API

Go to Telephony > Inbound Gateways in the VideoSDK Dashboard and click Add.

Give your gateway a name (e.g., "WhatsApp Gateway") and enter your WhatsApp Business phone number.

cURL
curl --request POST \
  --url https://api.videosdk.live/v2/sip/inbound-gateways \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{ "name": "WhatsApp Gateway", "numbers": ["+1234567890"] }'

API Reference: Create Inbound Gateway

Step 2: Configure an Outbound Gateway

Configure an Outbound Gateway

This is the exit point for calls your agent makes to the phone network.

Via Dashboard
Via API

Go to Telephony > Outbound Gateways and click Add.

Give it a name and provide the SIP details from your provider. For WhatsApp, this step is for enabling agent-initiated outbound calls.

To get username and password make use of meta graph API, switch in Via API tab.

Get SIP Credentials from Meta

First, you need to get the SIP credentials from the Meta Graph API.

cURL
curl --location --globoff 'https://graph.facebook.com/v17.0/{{phone_id}}/settings?include_sip_credentials=true' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json'

The API response will look something like this:

Response
{
  "calling": {
    "status": "ENABLED",
    "call_icon_visibility": "DEFAULT",
    "callback_permission_status": "DISABLED",
    "srtp_key_exchange_protocol": "DTLS",
    "sip": {
      "status": "ENABLED",
      "servers": [
        {
          "app_id": 1300814931425659,
          "hostname": "9WXXXXXXXX.sip.videosdk.live",
          "sip_user_password": "v18yo4xxxxxxxxxxxx"
        }
      ]
    }
  },
  "storage_configuration": {
    "status": "DEFAULT"
  }
}

Create VideoSDK Outbound Gateway

Now, use the sip_user_password from the previous step to create an outbound gateway in VideoSDK.

cURL
curl --request POST \
  --url https://api.videosdk.live/v2/sip/outbound-gateways \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{ "name": "My Outbound Gateway", "numbers": ["+1234567890"], "address": "9WXXXXXXXX.sip.videosdk.live", "auth": { "username": "your_whatsapp_number", "password": "v18yo40Lhxxxxxx" } }'

API Reference: Create Outbound Gateway

Step 3: Create a Routing Rule

Create a Routing Rule

This rule connects the Inbound Gateway to your specific AI agent.

Via Dashboard
Via API

Go to Telephony > Routing Rules and click Add.

Configure the rule:

Gateway: Select the "WhatsApp Gateway" you just created.
Numbers: Add your WhatsApp Business phone number.
Dispatch: Choose Agent.
Agent Type: Set to Self Hosted.
Agent ID: Enter agent1. This must exactly match the agent_id in your main.py script.

Click Create.

cURL
curl --request POST \
  --url https://api.videosdk.live/v2/sip/routing-rules \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{ "gatewayId": "your_inbound_gateway_id", "name": "WhatsApp Call Routing", "numbers": ["+1234567890"], "dispatch": "agent", "agentType": "self_hosted", "agentId": "agent1" }'

API Reference: Create Routing Rule

Part 3: Enable WhatsApp SIP Forwarding

Now, we'll instruct Meta to forward incoming WhatsApp calls to your VideoSDK Inbound Gateway. This is done via the Meta Graph API.

Step 1: API Request

API Request

Use the following curl command to update your WhatsApp phone number's settings.

cURL
curl --location 'https://graph.facebook.com/v19.0/{{phone_number_id}}/settings' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json' \
--data '{ "calling": { "status": "ENABLED", "sip": { "status": "ENABLED", "servers": [ { "hostname": "9WXXXXXXX.sip.videosdk.live" } ] }, "srtp_key_exchange_protocol": "DTLS" } }'

Replace the placeholders:

{{phone_number_id}}: Your WhatsApp Business Phone Number ID from the Meta dashboard.
{{access_token}}: A valid User or System User access token with whatsapp_business_management permission.

Step 2: API Response

API Response

A successful request will return:

Response
{
  "success": true
}

Your integration is now complete! Meta will forward all incoming voice calls to your WhatsApp number to VideoSDK, which will then route them to your running agent.

Time to Talk! Test Your Agent

Keep Your Agent Running

Make sure your main.py script is still running locally before making or receiving calls. The agent must be active to handle any communication.

Receive an Inbound Call

Ensure your main.py script is still running locally.
Using a different WhatsApp account, place a voice call to your WhatsApp Business number.
Your local agent will answer, and you'll hear its greeting. Start a conversation!

Make an Outbound Call

To have your agent initiate a call to a WhatsApp number, use the VideoSDK SIP Call API.

cURL
curl --request POST \
  --url https://api.videosdk.live/v2/sip/call \
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{ "gatewayId": "your_outbound_gateway_id", "sipCallTo": "whatsapp_number_to_call" }'

This commands your agent to dial out through your configured outbound gateway.

Geographic Optimization

For optimal performance, run your agent in the same geographic region as your SIP provider. This reduces latency and improves call quality.

Next Steps

Congratulations! You've built and deployed a sophisticated AI telephony agent. You've seen how to run it locally and connect it to the global phone network for both inbound and outbound communication.

Deploy Your Agent

Learn how to deploy your AI agent to production

Explore Telephony Docs

Comprehensive telephony documentation and guides

Provider Integrations

SIP provider setup guides (Twilio, Vonage, etc.)

Got a Question? Ask us on discord

Architecture Overview - Call Flow​

Prerequisites for Meta Configuration​

Part 1: Build and Run Your Custom Voice Agent​

Step 1: Project Setup​

Step 2: Add Credentials and Dependencies​

Step 3: Create the Agent Logic​

Step 4: Install Dependencies and Run the Agent​

Part 2: Configure VideoSDK Gateways and Routing​

Step 1: Configure an Inbound Gateway​

Step 2: Configure an Outbound Gateway​

Step 3: Create a Routing Rule​

Part 3: Enable WhatsApp SIP Forwarding​

Step 1: API Request​

Step 2: API Response​

Time to Talk! Test Your Agent​

Receive an Inbound Call​

Make an Outbound Call​

Next Steps​