Agent
The Agent
class is the base class for defining AI agent behavior and capabilities. It provides the foundation for creating intelligent conversational agents with support for function tools, MCP servers, and advanced lifecycle management.
Basic Usage
Simple Agent
This is how you can initialize a simple agent with the Agent
class, where instructions
defines how the agent should behave.
from videosdk.agents import Agent
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant."
)
Agent with Function Tools
Function tools allow your agent to perform actions and interact with external services, extending its capabilities beyond simple conversation. You can register tools that are defined either outside or inside your agent class.
External Tools
External tools are defined as standalone functions and are passed into the agent's constructor via the tools list. This is useful for sharing common tools across multiple agents.
from videosdk.agents import Agent, function_tool
# External tool defined outside the class
@function_tool(description="Get weather information")
def get_weather(location: str) -> str:
"""Get weather information for a specific location."""
# Weather logic here
return f"Weather in {location}: Sunny, 72°F"
class WeatherAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a weather assistant.",
tools=[get_weather] # Register the external tool
)
Internal Tools
Internal tools are defined as methods within your agent class and are decorated with @function_tool
. This is useful for logic that is specific to the agent and needs access to its internal state (self
).
from videosdk.agents import Agent, function_tool
class FinanceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful financial assistant."
)
self.portfolio = {"AAPL": 10, "GOOG": 5}
@function_tool
def get_portfolio_value(self) -> dict:
"""Get the current value of the user's stock portfolio."""
# In a real scenario, you'd fetch live stock prices
# This is a simplified example
return {"total_value": 5000, "holdings": self.portfolio}
Agent with MCP Server
MCPServerStdio
enables your agent to communicate with external processes via standard input/output streams. This is ideal for integrating complex, standalone Python scripts or other local executables as tools.
import sys
from pathlib import Path
from videosdk.agents import Agent, MCPServerStdio
# Path to your external Python script that runs the MCP server
mcp_server_path = Path(__file__).parent / "mcp_server_script.py"
class MCPAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are an assistant that can leverage external tools via MCP.",
mcp_servers=[
MCPServerStdio(
executable_path=sys.executable,
process_arguments=[str(mcp_server_path)],
session_timeout=30
)
]
)
Agent Lifecycle and Methods
The Agent
class provides lifecycle hooks and methods to manage state and behavior at critical points in the agent's session.
Lifecycle Hooks
These methods are designed to be overridden in your custom agent class to implement specific behaviors.
async def on_enter(self) -> None
: Called once when the agent successfully joins the meeting. This is the ideal place for introductions or initial actions, such as greeting participants.async def on_exit(self) -> None
: Called when the agent is about to exit the meeting. Use this for cleanup tasks or for saying goodbye.
from videosdk.agents import Agent
class LifecycleAgent(Agent):
async def on_enter(self):
print("Agent has entered the meeting.")
await self.session.say("Hello everyone! I'm here to help.")
async def on_exit(self):
print("Agent is exiting the meeting.")
await self.session.say("It was a pleasure assisting you. Goodbye!")
Human in the Loop (HITL)
Human in the Loop enables AI agents to escalate specific queries to human operators for review and approval. This implementation uses Discord as the human interface through an MCP server, allowing seamless handoffs between AI automation and human oversight.
Use Cases
- Discount Requests: AI escalates pricing queries to human sales agents
- Complex Support: Technical issues requiring human expertise
- Policy Decisions: Requests that need human approval or clarification
- Escalation Scenarios: Situations where AI confidence is low
Implementation
The HITL pattern combines the Agent's MCP server capability with a Discord-based human interface:
- Agent Configuration
- Discord MCP Server
- Environment Variables
from videosdk.agents import Agent, MCPServerStdio, CascadingPipeline, AgentSession, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.plugins.google import GoogleTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector
import pathlib
import sys
import os
from typing import Optional
class CustomerAgent(Agent):
def __init__(self, ctx: Optional[JobContext] = None):
current_dir = pathlib.Path(__file__).parent
discord_mcp_server_path = current_dir / "discord_mcp_server.py"
super().__init__(
instructions="You are a customer-facing agent for VideoSDK. You have access to various tools to assist with customer inquiries, provide support, and handle tasks. When a user asks for a discount percentage, always use the appropriate tool to retrieve and provide the accurate answer from your superior human agent.",
mcp_servers=[
MCPServerStdio(
executable_path=sys.executable,
process_arguments=[str(discord_mcp_server_path)],
session_timeout=30
),
]
)
self.ctx = ctx
async def on_enter(self) -> None:
"""Called when the agent first joins the meeting"""
await self.session.say("Hi! I'm your VideoSDK customer support agent. How can I help you today?")
async def on_exit(self) -> None:
"""Called when the agent exits the meeting"""
await self.session.say("Thank you for contacting VideoSDK support. Have a great day!")
# Pipeline configuration integrated into the main setup
def create_pipeline() -> CascadingPipeline:
"""Create and configure the cascading pipeline with all components"""
return CascadingPipeline(
stt=DeepgramSTT(api_key=os.getenv("DEEPGRAM_API_KEY")),
llm=AnthropicLLM(api_key=os.getenv("ANTHROPIC_API_KEY")),
tts=GoogleTTS(api_key=os.getenv("GOOGLE_API_KEY")),
vad=SileroVAD(),
turn_detector=TurnDetector(threshold=0.8)
)
async def start_session(ctx: JobContext):
"""Main entry point that creates agent, pipeline, and starts the session"""
# Create the pipeline
pipeline = create_pipeline()
# Create the agent with context
agent = CustomerAgent(ctx=ctx)
# Create the agent session
session = AgentSession(
agent=agent,
pipeline=pipeline
)
try:
# Connect to the room
await ctx.connect()
# Start the agent session
await session.start()
# Keep running until interrupted
import asyncio
await asyncio.Event().wait()
finally:
# Clean up resources
await session.close()
await ctx.shutdown()
def make_context() -> JobContext:
"""Create the job context with room configuration"""
room_options = RoomOptions(
room_id=os.getenv("VIDEOSDK_ROOM_ID", "your-room-id"),
auth_token=os.getenv("VIDEOSDK_AUTH_TOKEN"),
name="VideoSDK Customer Agent",
playground=True
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
import asyncio
import os
from mcp.server.fastmcp import FastMCP
import discord
from discord.ext import commands
class DiscordHuman:
def __init__(self, user_id: int, channel_id: int, bot_token: str):
self.user_id = user_id
self.channel_id = channel_id
self.bot_token = bot_token
self.bot = commands.Bot(command_prefix="!", intents=discord.Intents.all())
self.response_future = None
self.setup_bot_events()
def setup_bot_events(self):
@self.bot.event
async def on_ready():
print(f'{self.bot.user} has connected to Discord!')
@self.bot.event
async def on_message(message):
if (message.author.id == self.user_id and
message.channel.id in [thread.id for thread in self.bot.get_all_channels() if hasattr(thread, 'parent')] and
self.response_future and not self.response_future.done()):
self.response_future.set_result(message.content)
async def start_bot(self):
"""Start the Discord bot"""
await self.bot.start(self.bot_token)
async def ask(self, question: str) -> str:
if not self.bot.is_ready():
return "❌ Discord bot is not ready"
try:
channel = self.bot.get_channel(self.channel_id)
if not channel:
return "❌ Channel not found"
thread = await channel.create_thread(
name=question[:100],
type=discord.ChannelType.public_thread
)
await thread.send(f"<@{self.user_id}> {question}")
self.response_future = asyncio.get_event_loop().create_future()
try:
response = await asyncio.wait_for(self.response_future, timeout=600)
return response
except asyncio.TimeoutError:
return "⏱️ Timed out waiting for a human response"
except Exception as e:
return f"❌ Error: {str(e)}"
# Initialize Discord human instance
discord_human = DiscordHuman(
user_id=int(os.getenv("DISCORD_USER_ID")),
channel_id=int(os.getenv("DISCORD_CHANNEL_ID")),
bot_token=os.getenv("DISCORD_TOKEN")
)
# MCP Server Setup
mcp = FastMCP("HumanInTheLoopServer")
@mcp.tool(description="Ask a human agent via Discord for a specific user query such as discount percentage, etc.")
async def ask_human(question: str) -> str:
"""Ask a human agent via Discord for assistance"""
return await discord_human.ask(question)
async def main():
"""Main function to start both the Discord bot and MCP server"""
# Start Discord bot in background
bot_task = asyncio.create_task(discord_human.start_bot())
# Wait a moment for bot to initialize
await asyncio.sleep(2)
# Start MCP server
await mcp.run()
if __name__ == "__main__":
asyncio.run(main())
Set the following environment variables:
DISCORD_TOKEN=your_discord_bot_token
DISCORD_USER_ID=human_operator_user_id
DISCORD_CHANNEL_ID=channel_id_for_escalations
DEEPGRAM_API_KEY=your_deepgram_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
VIDEOSDK_AUTH_TOKEN=your_videosdk_token
VIDEOSDK_ROOM_ID=your_room_id
The Discord MCP server provides the ask_human
tool that creates Discord threads for human operator responses. This leverages the same MCP integration pattern shown in the previous section.
Complete implementation with full source code, setup instructions, and configuration examples available in the VideoSDK Agents GitHub repository. -->
Examples - Try Out Yourself
Checkout the examples of function tool usage and MCP server.
Got a Question? Ask us on discord