Real-time AI Agent - Overview
In traditional online meetings, participants join, talk, listen, and collaborate. But what if one of those participants wasn’t human, yet could think, respond, and handle repetitive tasks? With VideoSDK, you can integrate AI agents into meetings, they can automate workflows, assist participants, and enhance collaboration in the virtual world.
Building an AI agent that listens, thinks, and speaks feels like giving life to a virtual presence. Each of these capabilities adds a layer of intelligence, turning the AI from a passive observer to an active participant in the meeting. Here's how these three pillars of interaction come together:
Capability | Tools You Could Use | Description |
---|---|---|
STT | Deepgram, Google Speech-to-Text, AssemblyAI, Whisper, Sensory's TrulyHandsfree | Tools that convert spoken language into text, enabling applications to process and understand human speech. |
LLM | OpenAI's GPT Models, Google Bard, Claude by Anthropic, Meta's Llama 4, Amazon's Alexa+ | Advanced language models and AI assistants capable of understanding context, generating human-like text, and engaging in meaningful conversations. |
TTS | Eleven Labs, Amazon Polly, Microsoft Azure Speech Service, WaveNet, Sanas, Uniphore | Solutions that transform text into natural-sounding speech, allowing applications to communicate with users audibly. |
Workflow - Add an AI Agent to a Meeting​
At VideoSDK, we are always keen on finding new ways to connect people virtually. Whether it’s a game, an online meeting, or any scenario that demands a human presence in a virtual setting, we strive to push the boundaries. So, why not take it a step further and introduce a truly human-like participant—an AI copilot meticulously crafted for its role?
1. Setup Environment & Dependencies​
Install our Python SDK using pip, Python's package installer. Ensure you have a Python environment configured on your machine before installation. VideoSDK on PyPI
pip install videosdk
2. Connect to an Existing Meeting​
The MeetingConfig
class helps configure initial agent settings before joining a meeting. Since AI Agents don't have physical microphones, you can create a custom microphone audio track.
from videosdk import MeetingConfig
# Initial settings for AI Agent
meeting_config = MeetingConfig(
name=name,
meeting_id=meeting_id,
token=authToken,
mic_enabled=True,
webcam_enabled=False,
custom_microphone_audio_track=self.audio_track
)
Next, create an instance connected to an active session. This session can be running on any platform (e.g., iOS, Web, Android).
agent = VideoSDK.init_meeting(**meeting_config)
Finally, call the join()
method on the meeting instance to connect the AI Agent to the existing meeting.
agent.join()
3. Handle Meeting Events​
Implement meeting event callbacks by extending VideoSDK's MeetingEventHandler
class.
from videosdk import Participant, MeetingEventHandler
from participant_events import MyParticipantEventHandler
class MyMeetingEventHandler(MeetingEventHandler):
def __init__(self):
super().__init__()
def on_meeting_joined(self, data):
print("Meeting joined:", data)
def on_meeting_left(self, data):
print("Meeting left:", data)
def on_participant_joined(self, participant: Participant):
print("Participant joined:", participant)
participant.add_event_listener(
MyParticipantEventHandler(participant_id=participant.id)
)
def on_participant_left(self, participant: Participant):
print("Participant left:", participant)
Implement remote participant event listeners using similar patterns.
from videosdk import ParticipantEventHandler, Stream
class MyParticipantEventHandler(ParticipantEventHandler):
def __init__(self, participant_id: str):
super().__init__()
self.participant_id = participant_id
def on_stream_enabled(self, stream: Stream):
print("Participant stream enabled:", self.participant_id, stream.kind)
def on_stream_disabled(self, stream: Stream):
print("Participant stream disabled:", self.participant_id, stream.kind)
4. Working Example​
The code below includes a working example where an AI agent joins an existing meeting using a FastAPI application. This application listens to client requests, accesses meeting credentials such as meeting_id
and token
, and joins an existing meeting.
from videosdk import MeetingConfig, VideoSDK
class AIAgent:
def __init__(self, meeting_id: str, authToken: str, name: str):
# MeetingConfig
self.meeting_config = MeetingConfig(
name=name,
meeting_id=meeting_id,
token=authToken,
mic_enabled=False,
webcam_enabled=False
)
# Create instance of existing session
self.agent = VideoSDK.init_meeting(**self.meeting_config)
async def join(self):
await self.agent.async_join()
Next, we create a basic server where the client sends a request to join an AI Agent to an existing session.
from fastapi.middleware.cors import CORSMiddleware
from fastapi import FastAPI
from pydantic import BaseModel
from agent import AIAgent
port = 8000
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
ai_agent = None
class MeetingReqConfig(BaseModel):
meeting_id: str
token: str
@app.get("/test")
async def test():
return {"message": "CORS is working!"}
# Join AI agent
@app.post("/join-player")
async def join_player(req: MeetingReqConfig):
global ai_agent
ai_agent = AIAgent(req.meeting_id, req.token, "AI")
await ai_agent.join()
return {"message": "AI agent joined"}
# Running the server on port: 8000
if __name__ == "__main__":
import uvicorn
uvicorn.run("main:app", host="127.0.0.1", port=8000)
Use Cases​
Imagine a meeting where an intelligent participant listens, reflects, and responds with a human touch—making your digital interactions feel natural and engaging. Our projects exemplify how AI agents can be seamlessly integrated into your virtual workspace, enhancing communication and collaboration with lifelike interaction.
Explore Our AI Agent Projects
Project | Description |
---|---|
AI Voice Agent with Deepgram STT | A real-time meeting assistant that harnesses advanced speech-to-text and voice synthesis technologies to deliver dynamic, natural interactions. |
AI Game Agent with ElevenLabs STT | An engaging game agent that combines voice interaction with real-time communication, bringing a human-like presence to your gaming experiences. |
AI Translator Agent with OpenAI Realtime API | A sophisticated translation tool that facilitates seamless multilingual communication, breaking down language barriers during online meetings. |
Got a Question? Ask us on discord