Skip to main content

WhatsApp Agent Quick Start

This quickstart guide will walk you through creating a powerful AI voice agent that can answer calls made to your WhatsApp Business number. We will achieve this using a direct SIP integration between the Meta Business Platform and VideoSDK, which simplifies the architecture and removes the need for a third-party telephony provider.

Architecture Overview - Call Flow

The diagram below illustrates the end-to-end call flow we are building. A call initiated by a WhatsApp User is received by the Meta Business Platform, which then forwards it directly via SIP to the VideoSDK SIP Gateway. From there, Routing Rules direct the call to our AI Agent.

Whats Voice Agent Call Flow

Prerequisites for Meta Configuration

This guide assumes you have already completed the initial setup of your business presence on the Meta platform.

Essential: Meta Graph API Setup

Integrating inbound/outbound WhatsApp calls requires updating your number's settings via the Meta Graph API. This guide covers the process in Part 3: Enable WhatsApp SIP Forwarding. For a deeper understanding of the API, refer to the official Meta Graph API overview.

Part 1: Build and Run Your Custom Voice Agent

First, we'll create the AI agent that will handle the conversation logic. This agent will run on your local machine for testing.

Step 1: Project Setup

Create a directory for your project and add the following files:

  • .env: To store your secret credentials.
  • requirements.txt: To list the Python dependencies.
  • main.py: The main script for your AI agent.

Step 2: Add Credentials and Dependencies

In your .env file, add the necessary API keys.

.env
VIDEOSDK_TOKEN="your_videosdk_token_here"
GOOGLE_API_KEY="your_google_api_key_here"

API Keys: Get your Google API Key and create a VideoSDK Account to generate your token .

In requirements.txt, add the dependencies.

requirements.txt
videosdk-agents==0.0.32
videosdk-plugins-google==0.0.32
python-dotenv==1.1.1

Latest Version: Check the latest videosdk-agents version on PyPI for the most recent release.

Step 3: Create the Agent Logic

Paste the following code into main.py. This defines the agent's personality and sets it up to be discoverable by VideoSDK's telephony service.

main.py
import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, WorkerJob, Options
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
load_dotenv()

# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
)
async def on_enter(self) -> None:
await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Thank you for calling. Goodbye!")

async def start_session(context: JobContext):
model = GeminiRealtime(
api_key=os.getenv("GOOGLE_API_KEY"),
config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"])
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(agent=MyWhatsappAgent(), pipeline=pipeline)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()

if __name__ == "__main__":
try:
options = Options(
agent_id="agent1", # CRITICAL: Unique ID for routing
register=True, # REQUIRED: Register with VideoSDK for telephony
max_processes=10,
)
job = WorkerJob(entrypoint=start_session, options=options)
job.start()
except Exception as e:
traceback.print_exc()

Step 4: Install Dependencies and Run the Agent

CLI Commands
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

# Install packages
pip install -r requirements.txt

# Run the agent
python main.py

Your agent is now running and waiting for connections. Keep the terminal open.

Part 2: Configure VideoSDK Gateways and Routing

Next, we need to tell VideoSDK how to handle incoming calls and where to send them.

Step 1: Configure an Inbound Gateway

This is the entry point for calls coming from WhatsApp into VideoSDK.

Go to Telephony > Inbound Gateways in the VideoSDK Dashboard and click Add.

Give your gateway a name (e.g., "WhatsApp Gateway") and enter your WhatsApp Business phone number.

Step 2: Configure an Outbound Gateway

This is the exit point for calls your agent makes to the phone network.

Go to Telephony > Outbound Gateways and click Add.

Give it a name and provide the SIP details from your provider. For WhatsApp, this step is for enabling agent-initiated outbound calls.

To get username and password make use of meta graph API, switch in Via API tab.

Step 3: Create a Routing Rule

This rule connects the Inbound Gateway to your specific AI agent.

Go to Telephony > Routing Rules and click Add.

Configure the rule:

  • Gateway: Select the "WhatsApp Gateway" you just created.
  • Numbers: Add your WhatsApp Business phone number.
  • Dispatch: Choose Agent.
  • Agent Type: Set to Self Hosted.
  • Agent ID: Enter agent1. This must exactly match the agent_id in your main.py script.

Click Create.


Part 3: Enable WhatsApp SIP Forwarding

Now, we'll instruct Meta to forward incoming WhatsApp calls to your VideoSDK Inbound Gateway. This is done via the Meta Graph API.

Step 1: API Request

Use the following curl command to update your WhatsApp phone number's settings.

cURL
curl --location 'https://graph.facebook.com/v19.0/{{phone_number_id}}/settings' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json' \
--data '{ "calling": { "status": "ENABLED", "sip": { "status": "ENABLED", "servers": [ { "hostname": "9WXXXXXXX.sip.videosdk.live" } ] }, "srtp_key_exchange_protocol": "DTLS" } }'

Replace the placeholders:

  • {{phone_number_id}}: Your WhatsApp Business Phone Number ID from the Meta dashboard.
  • {{access_token}}: A valid User or System User access token with whatsapp_business_management permission.

Step 2: API Response

A successful request will return:

Response
{
"success": true
}

Your integration is now complete! Meta will forward all incoming voice calls to your WhatsApp number to VideoSDK, which will then route them to your running agent.


Time to Talk! Test Your Agent

Keep Your Agent Running

Make sure your main.py script is still running locally before making or receiving calls. The agent must be active to handle any communication.

Receive an Inbound Call

  1. Ensure your main.py script is still running locally.
  2. Using a different WhatsApp account, place a voice call to your WhatsApp Business number.
  3. Your local agent will answer, and you'll hear its greeting. Start a conversation!

Make an Outbound Call

To have your agent initiate a call to a WhatsApp number, use the VideoSDK SIP Call API.

cURL
curl --request POST \
--url https://api.videosdk.live/v2/sip/call \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "gatewayId": "your_outbound_gateway_id", "sipCallTo": "whatsapp_number_to_call" }'

This commands your agent to dial out through your configured outbound gateway.

Geographic Optimization

For optimal performance, run your agent in the same geographic region as your SIP provider. This reduces latency and improves call quality.

Next Steps

Congratulations! You've built and deployed a sophisticated AI telephony agent. You've seen how to run it locally and connect it to the global phone network for both inbound and outbound communication.

Got a Question? Ask us on discord