Introduction
The AI Agent SDK is a Python framework built on top of the VideoSDK Python SDK that enables AI-powered agents to join VideoSDK rooms as participants. This SDK serves as a real-time bridge between AI models (like OpenAI and Gemini) and your users, facilitating seamless voice and media interactions.
High Level Architecture​
This architecture shows how AI voice agents connect to VideoSDK meetings. The system links your backend with VideoSDK's platform, allowing AI assistants to interact with users in real-time.

Key Components​
- Your Backend: Hosts the Worker and Agent Job that powers the AI
- VideoSDK Cloud: Manages the meeting rooms where agents and users interact
- Client SDK: Applications on user devices (web, mobile, or SIP)
How It Works​
The system follows a simple four-step process:
- Your Worker registers with VideoSDK Cloud
- A user joins a meeting room using a Client SDK
- VideoSDK notifies your Worker that an agent should join
- Your Agent connects to the room and interacts with users
This design separates AI processing (handled by your backend) from communication management (handled by VideoSDK), creating a flexible and scalable solution for voice-enabled AI assistants in video meetings.
Got a Question? Ask us on discord