WhatsApp Agent Quick Start
This quickstart guide will walk you through creating a powerful AI voice agent that can answer calls made to your WhatsApp Business number. We will achieve this using a direct SIP integration between the Meta Business Platform and VideoSDK, which simplifies the architecture and removes the need for a third-party telephony provider.
Architecture Overview - Call Flow
The diagram below illustrates the end-to-end call flow we are building. A call initiated by a WhatsApp User is received by the Meta Business Platform, which then forwards it directly via SIP to the VideoSDK SIP Gateway. From there, Routing Rules direct the call to our AI Agent.
Prerequisites for Meta Configuration
This guide assumes you have already completed the initial setup of your business presence on the Meta platform.
- A Meta (Facebook) Business Manager Account that is verified.
- A Phone number that has been added and verified in your WhatsApp Business Account (WABA).
- A Meta Developer App with the
whatsapp_business_management
permission enabled. - A Permanent User Access Token for meta graph api endpoint.
Integrating inbound/outbound WhatsApp calls requires updating your number's settings via the Meta Graph API. This guide covers the process in Part 3: Enable WhatsApp SIP Forwarding. For a deeper understanding of the API, refer to the official Meta Graph API overview.
Part 1: Build and Run Your Custom Voice Agent
First, we'll create the AI agent that will handle the conversation logic. This agent will run on your local machine for testing.
Step 1: Project Setup
Create a directory for your project and add the following files:
.env
: To store your secret credentials.requirements.txt
: To list the Python dependencies.main.py
: The main script for your AI agent.
Step 2: Add Credentials and Dependencies
In your .env
file, add the necessary API keys.
- Realtime Pipeline
- Cascading Pipeline
VIDEOSDK_TOKEN="your_videosdk_token_here"
GOOGLE_API_KEY="your_google_api_key_here"
API Keys: Get your Google API Key and create a VideoSDK Account to generate your token .
VIDEOSDK_TOKEN="your_videosdk_token_here"
DEEPGRAM_API_KEY="your_deepgram_api_key_here"
OPENAI_API_KEY="your_openai_api_key_here"
ELEVENLABS_API_KEY="your_elevenlabs_api_key_here"
API Keys: Get keys from Deepgram, OpenAI, ElevenLabs, and VideoSDK Account to generate videosdk token .
In requirements.txt
, add the dependencies.
- Realtime Pipeline
- Cascading Pipeline
videosdk-agents==0.0.32
videosdk-plugins-google==0.0.32
python-dotenv==1.1.1
videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector]==0.0.32
python-dotenv==1.1.1
Latest Version: Check the latest videosdk-agents version on PyPI for the most recent release.
Step 3: Create the Agent Logic
Paste the following code into main.py
. This defines the agent's personality and sets it up to be discoverable by VideoSDK's telephony service.
- Realtime Pipeline
- Cascading Pipeline
import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, WorkerJob, Options
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
load_dotenv()
# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
)
async def on_enter(self) -> None:
await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Thank you for calling. Goodbye!")
async def start_session(context: JobContext):
model = GeminiRealtime(
api_key=os.getenv("GOOGLE_API_KEY"),
config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"])
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(agent=MyWhatsappAgent(), pipeline=pipeline)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()
if __name__ == "__main__":
try:
options = Options(
agent_id="agent1", # CRITICAL: Unique ID for routing
register=True, # REQUIRED: Register with VideoSDK for telephony
max_processes=10,
)
job = WorkerJob(entrypoint=start_session, options=options)
job.start()
except Exception as e:
traceback.print_exc()
import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, WorkerJob, Options, ConversationFlow
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
load_dotenv()
pre_download_model()
# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
)
async def on_enter(self) -> None:
await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Thank you for calling. Goodbye!")
async def start_session(context: JobContext):
pipeline = CascadingPipeline(
stt=DeepgramSTT(model="nova-2", language="en"),
llm=OpenAILLM(model="gpt-4o"),
tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
vad=SileroVAD(threshold=0.35),
turn_detector=TurnDetector(threshold=0.8)
)
session = AgentSession(
agent=MyWhatsappAgent(),
pipeline=pipeline,
conversation_flow=ConversationFlow(MyWhatsappAgent())
)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()
if __name__ == "__main__":
try:
options = Options(
agent_id="agent1", # CRITICAL: Unique ID for routing
register=True, # REQUIRED: Register with VideoSDK for telephony
max_processes=10,
)
job = WorkerJob(entrypoint=start_session, options=options)
job.start()
except Exception as e:
traceback.print_exc()
Step 4: Install Dependencies and Run the Agent
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install packages
pip install -r requirements.txt
# Run the agent
python main.py
Your agent is now running and waiting for connections. Keep the terminal open.
Part 2: Configure VideoSDK Gateways and Routing
Next, we need to tell VideoSDK how to handle incoming calls and where to send them.
Step 1: Configure an Inbound Gateway
This is the entry point for calls coming from WhatsApp into VideoSDK.
- Via Dashboard
- Via API
Go to Telephony > Inbound Gateways in the VideoSDK Dashboard and click Add.
Give your gateway a name (e.g., "WhatsApp Gateway") and enter your WhatsApp Business phone number.
curl --request POST \
--url https://api.videosdk.live/v2/sip/inbound-gateways \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "name": "WhatsApp Gateway", "numbers": ["+1234567890"] }'
API Reference: Create Inbound Gateway
Step 2: Configure an Outbound Gateway
This is the exit point for calls your agent makes to the phone network.
- Via Dashboard
- Via API
Go to Telephony > Outbound Gateways and click Add.
Give it a name and provide the SIP details from your provider. For WhatsApp, this step is for enabling agent-initiated outbound calls.
To get username
and password
make use of meta graph API, switch in Via API tab.
Get SIP Credentials from Meta
First, you need to get the SIP credentials from the Meta Graph API.
curl --location --globoff 'https://graph.facebook.com/v17.0/{{phone_id}}/settings?include_sip_credentials=true' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json'
The API response will look something like this:
{
"calling": {
"status": "ENABLED",
"call_icon_visibility": "DEFAULT",
"callback_permission_status": "DISABLED",
"srtp_key_exchange_protocol": "DTLS",
"sip": {
"status": "ENABLED",
"servers": [
{
"app_id": 1300814931425659,
"hostname": "9WXXXXXXXX.sip.videosdk.live",
"sip_user_password": "v18yo4xxxxxxxxxxxx"
}
]
}
},
"storage_configuration": {
"status": "DEFAULT"
}
}
Create VideoSDK Outbound Gateway
Now, use the sip_user_password
from the previous step to create an outbound gateway in VideoSDK.
curl --request POST \
--url https://api.videosdk.live/v2/sip/outbound-gateways \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "name": "My Outbound Gateway", "numbers": ["+1234567890"], "address": "9WXXXXXXXX.sip.videosdk.live", "auth": { "username": "your_whatsapp_number", "password": "v18yo40Lhxxxxxx" } }'
API Reference: Create Outbound Gateway
Step 3: Create a Routing Rule
This rule connects the Inbound Gateway to your specific AI agent.
- Via Dashboard
- Via API
Go to Telephony > Routing Rules and click Add.
Configure the rule:
- Gateway: Select the "WhatsApp Gateway" you just created.
- Numbers: Add your WhatsApp Business phone number.
- Dispatch: Choose Agent.
- Agent Type: Set to
Self Hosted
. - Agent ID: Enter
agent1
. This must exactly match theagent_id
in yourmain.py
script.
Click Create.
curl --request POST \
--url https://api.videosdk.live/v2/sip/routing-rules \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "gatewayId": "your_inbound_gateway_id", "name": "WhatsApp Call Routing", "numbers": ["+1234567890"], "dispatch": "agent", "agentType": "self_hosted", "agentId": "agent1" }'
API Reference: Create Routing Rule
Part 3: Enable WhatsApp SIP Forwarding
Now, we'll instruct Meta to forward incoming WhatsApp calls to your VideoSDK Inbound Gateway. This is done via the Meta Graph API.
Step 1: API Request
Use the following curl
command to update your WhatsApp phone number's settings.
curl --location 'https://graph.facebook.com/v19.0/{{phone_number_id}}/settings' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json' \
--data '{ "calling": { "status": "ENABLED", "sip": { "status": "ENABLED", "servers": [ { "hostname": "9WXXXXXXX.sip.videosdk.live" } ] }, "srtp_key_exchange_protocol": "DTLS" } }'
Replace the placeholders:
{{phone_number_id}}
: Your WhatsApp Business Phone Number ID from the Meta dashboard.{{access_token}}
: A valid User or System User access token withwhatsapp_business_management
permission.
Step 2: API Response
A successful request will return:
{
"success": true
}
Your integration is now complete! Meta will forward all incoming voice calls to your WhatsApp number to VideoSDK, which will then route them to your running agent.
Time to Talk! Test Your Agent
Make sure your main.py
script is still running locally before making or receiving calls. The agent must be active to handle any communication.
Receive an Inbound Call
- Ensure your
main.py
script is still running locally. - Using a different WhatsApp account, place a voice call to your WhatsApp Business number.
- Your local agent will answer, and you'll hear its greeting. Start a conversation!
Make an Outbound Call
To have your agent initiate a call to a WhatsApp number, use the VideoSDK SIP Call API.
curl --request POST \
--url https://api.videosdk.live/v2/sip/call \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "gatewayId": "your_outbound_gateway_id", "sipCallTo": "whatsapp_number_to_call" }'
This commands your agent to dial out through your configured outbound gateway.
For optimal performance, run your agent in the same geographic region as your SIP provider. This reduces latency and improves call quality.
Next Steps
Congratulations! You've built and deployed a sophisticated AI telephony agent. You've seen how to run it locally and connect it to the global phone network for both inbound and outbound communication.
Got a Question? Ask us on discord