Introduction

The AI Agent SDK is a Python framework built on top of the VideoSDK Python SDK that enables AI-powered agents to join VideoSDK rooms as participants. This SDK serves as a real-time bridge between AI models (like OpenAI and Gemini) and your users, facilitating seamless voice and media interactions.

High Level Architecture

This architecture shows how AI voice agents connect to VideoSDK meetings. The system links your backend with VideoSDK's platform, allowing AI assistants to interact with users in real-time.

Key Components

Your Backend: Hosts the Worker and Agent Job that powers the AI
VideoSDK Cloud: Manages the meeting rooms where agents and users interact
Client SDK: Applications on user devices (web, mobile, or SIP)

How It Works

The system follows a simple four-step process:

Your Worker registers with VideoSDK Cloud
A user joins a meeting room using a Client SDK
VideoSDK notifies your Worker that an agent should join
Your Agent connects to the room and interacts with users

This design separates AI processing (handled by your backend) from communication management (handled by VideoSDK), creating a flexible and scalable solution for voice-enabled AI assistants in video meetings.

Got a Question? Ask us on discord

High Level Architecture​

Key Components​

How It Works​

High Level Architecture

Key Components

How It Works