Introduction
The AI Agent SDK is a Python framework built on top of the VideoSDK Python SDK that enables AI-powered agents to join VideoSDK rooms as participants. This SDK serves as a real-time bridge between AI models (like OpenAI and Gemini) and your users, facilitating seamless voice and media interactions.
High Level Architecture
This architecture shows how AI voice agents connect to VideoSDK meetings. The system links your backend with VideoSDK's platform, allowing AI assistants to interact with users in real-time.

System Components
- Your Backend: Hosts the Worker and Agent Job that powers the AI agents
- VideoSDK Cloud: Manages the meeting rooms where agents and users interact in real time
- Client SDK: Applications on user devices (web, mobile, or SIP) that connect to VideoSDK meetings
Process Flow
- Register: Your backend worker registers with the VideoSDK Cloud
- Initiate to join Room: The user initiates joining a VideoSDK Room via the Client SDK on their device
- Notify worker for Agent to join Room: The VideoSDK Cloud notifies your backend worker to have an Agent join the room.
- Agent joins the room: The Agent connects to the VideoSDK Room and can interact with the user.
Got a Question? Ask us on discord