Skip to main content
Version: 0.0.x

Real-time AI Agent - Overview

In traditional online meetings, participants join, talk, listen, and collaborate. But what if one of those participants wasn’t human, yet could think, respond, and handle repetitive tasks? With VideoSDK, you can integrate AI agents into meetings, they can automate workflows, assist participants, and enhance collaboration in the virtual world.

Realtime AI Agent Workflow

Building an AI agent that listens, thinks, and speaks feels like giving life to a virtual presence. Each of these capabilities adds a layer of intelligence, turning the AI from a passive observer to an active participant in the meeting. Here's how these three pillars of interaction come together:

CapabilityTools You Could UseDescription
STTDeepgram, Google Speech-to-Text, AssemblyAI, Whisper, Sensory's TrulyHandsfreeTools that convert spoken language into text, enabling applications to process and understand human speech.
LLMOpenAI's GPT Models, Google Bard, Claude by Anthropic, Meta's Llama 4, Amazon's Alexa+Advanced language models and AI assistants capable of understanding context, generating human-like text, and engaging in meaningful conversations.
TTSEleven Labs, Amazon Polly, Microsoft Azure Speech Service, WaveNet, Sanas, UniphoreSolutions that transform text into natural-sounding speech, allowing applications to communicate with users audibly.

Workflow - Add an AI Agent to a Meeting​

At VideoSDK, we are always keen on finding new ways to connect people virtually. Whether it’s a game, an online meeting, or any scenario that demands a human presence in a virtual setting, we strive to push the boundaries. So, why not take it a step further and introduce a truly human-like participant—an AI copilot meticulously crafted for its role?

1. Setup Environment & Dependencies​

Install our Python SDK using pip, Python's package installer. Ensure you have a Python environment configured on your machine before installation. VideoSDK on PyPI

pip install videosdk

2. Connect to an Existing Meeting​

The MeetingConfig class helps configure initial agent settings before joining a meeting. Since AI Agents don't have physical microphones, you can create a custom microphone audio track.

from videosdk import MeetingConfig

# Initial settings for AI Agent
meeting_config = MeetingConfig(
name=name,
meeting_id=meeting_id,
token=authToken,
mic_enabled=True,
webcam_enabled=False,
custom_microphone_audio_track=self.audio_track
)

Next, create an instance connected to an active session. This session can be running on any platform (e.g., iOS, Web, Android).

agent = VideoSDK.init_meeting(**meeting_config)

Finally, call the join() method on the meeting instance to connect the AI Agent to the existing meeting.

agent.join()

3. Handle Meeting Events​

Implement meeting event callbacks by extending VideoSDK's MeetingEventHandler class.

from videosdk import Participant, MeetingEventHandler
from participant_events import MyParticipantEventHandler

class MyMeetingEventHandler(MeetingEventHandler):
def __init__(self):
super().__init__()

def on_meeting_joined(self, data):
print("Meeting joined:", data)

def on_meeting_left(self, data):
print("Meeting left:", data)

def on_participant_joined(self, participant: Participant):
print("Participant joined:", participant)
participant.add_event_listener(
MyParticipantEventHandler(participant_id=participant.id)
)

def on_participant_left(self, participant: Participant):
print("Participant left:", participant)

Implement remote participant event listeners using similar patterns.

from videosdk import ParticipantEventHandler, Stream

class MyParticipantEventHandler(ParticipantEventHandler):
def __init__(self, participant_id: str):
super().__init__()
self.participant_id = participant_id

def on_stream_enabled(self, stream: Stream):
print("Participant stream enabled:", self.participant_id, stream.kind)

def on_stream_disabled(self, stream: Stream):
print("Participant stream disabled:", self.participant_id, stream.kind)

4. Working Example​

The code below includes a working example where an AI agent joins an existing meeting using a FastAPI application. This application listens to client requests, accesses meeting credentials such as meeting_id and token, and joins an existing meeting. AI Agent FastAPI example

from videosdk import MeetingConfig, VideoSDK

class AIAgent:
def __init__(self, meeting_id: str, authToken: str, name: str):
# MeetingConfig
self.meeting_config = MeetingConfig(
name=name,
meeting_id=meeting_id,
token=authToken,
mic_enabled=False,
webcam_enabled=False
)
# Create instance of existing session
self.agent = VideoSDK.init_meeting(**self.meeting_config)

async def join(self):
await self.agent.async_join()

Next, we create a basic server where the client sends a request to join an AI Agent to an existing session.

from fastapi.middleware.cors import CORSMiddleware
from fastapi import FastAPI
from pydantic import BaseModel
from agent import AIAgent

port = 8000
app = FastAPI()

app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

ai_agent = None

class MeetingReqConfig(BaseModel):
meeting_id: str
token: str

@app.get("/test")
async def test():
return {"message": "CORS is working!"}

# Join AI agent
@app.post("/join-player")
async def join_player(req: MeetingReqConfig):
global ai_agent
ai_agent = AIAgent(req.meeting_id, req.token, "AI")
await ai_agent.join()
return {"message": "AI agent joined"}

# Running the server on port: 8000
if __name__ == "__main__":
import uvicorn
uvicorn.run("main:app", host="127.0.0.1", port=8000)

Use Cases​

Imagine a meeting where an intelligent participant listens, reflects, and responds with a human touch—making your digital interactions feel natural and engaging. Our projects exemplify how AI agents can be seamlessly integrated into your virtual workspace, enhancing communication and collaboration with lifelike interaction.

Explore Our AI Agent Projects

ProjectDescription
AI Voice Agent with Deepgram STTA real-time meeting assistant that harnesses advanced speech-to-text and voice synthesis technologies to deliver dynamic, natural interactions.
AI Game Agent with ElevenLabs STTAn engaging game agent that combines voice interaction with real-time communication, bringing a human-like presence to your gaming experiences.
AI Translator Agent with OpenAI Realtime APIA sophisticated translation tool that facilitates seamless multilingual communication, breaking down language barriers during online meetings.

Got a Question? Ask us on discord