Skip to main content

Introduction

The AI Agent SDK is a Python framework built on top of the VideoSDK Python SDK that enables AI-powered agents to join VideoSDK rooms as participants. This SDK serves as a real-time bridge between AI models (like OpenAI and Gemini) and your users, facilitating seamless voice and media interactions.

High Level Architecture​

This architecture shows how AI voice agents connect to VideoSDK meetings. The system links your backend with VideoSDK's platform, allowing AI assistants to interact with users in real-time.

Key Components​

  • Your Backend: Hosts the Worker and Agent Job that powers the AI
  • VideoSDK Cloud: Manages the meeting rooms where agents and users interact
  • Client SDK: Applications on user devices (web, mobile, or SIP)

How It Works​

The system follows a simple four-step process:

  • Your Worker registers with VideoSDK Cloud
  • A user joins a meeting room using a Client SDK
  • VideoSDK notifies your Worker that an agent should join
  • Your Agent connects to the room and interacts with users

This design separates AI processing (handled by your backend) from communication management (handled by VideoSDK), creating a flexible and scalable solution for voice-enabled AI assistants in video meetings.

Got a Question? Ask us on discord