Skip to main content

Agent

The Agent class is the base class for defining AI agent behavior and capabilities. It provides the foundation for creating intelligent conversational agents with support for function tools, MCP servers, and advanced lifecycle management.

Agent

Basic Usage

Simple Agent

This is how you can initialize a simple agent with the Agent class, where instructions defines how the agent should behave.

main.py
from videosdk.agents import Agent

class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant."
)

Agent with Function Tools

Function tools allow your agent to perform actions and interact with external services, extending its capabilities beyond simple conversation. You can register tools that are defined either outside or inside your agent class.

External Tools

External tools are defined as standalone functions and are passed into the agent's constructor via the tools list. This is useful for sharing common tools across multiple agents.

main.py
from videosdk.agents import Agent, function_tool

# External tool defined outside the class
@function_tool(description="Get weather information")
def get_weather(location: str) -> str:
"""Get weather information for a specific location."""
# Weather logic here
return f"Weather in {location}: Sunny, 72°F"

class WeatherAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a weather assistant.",
tools=[get_weather] # Register the external tool
)

Internal Tools

Internal tools are defined as methods within your agent class and are decorated with @function_tool. This is useful for logic that is specific to the agent and needs access to its internal state (self).

main.py
from videosdk.agents import Agent, function_tool

class FinanceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful financial assistant."
)
self.portfolio = {"AAPL": 10, "GOOG": 5}

@function_tool
def get_portfolio_value(self) -> dict:
"""Get the current value of the user's stock portfolio."""
# In a real scenario, you'd fetch live stock prices
# This is a simplified example
return {"total_value": 5000, "holdings": self.portfolio}

Agent with MCP Server

MCPServerStdio enables your agent to communicate with external processes via standard input/output streams. This is ideal for integrating complex, standalone Python scripts or other local executables as tools.

main.py
import sys
from pathlib import Path
from videosdk.agents import Agent, MCPServerStdio

# Path to your external Python script that runs the MCP server
mcp_server_path = Path(__file__).parent / "mcp_server_script.py"

class MCPAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are an assistant that can leverage external tools via MCP.",
mcp_servers=[
MCPServerStdio(
executable_path=sys.executable,
process_arguments=[str(mcp_server_path)],
session_timeout=30
)
]
)

Agent Lifecycle and Methods

The Agent class provides lifecycle hooks and methods to manage state and behavior at critical points in the agent's session.

Lifecycle Hooks

These methods are designed to be overridden in your custom agent class to implement specific behaviors.

  • async def on_enter(self) -> None: Called once when the agent successfully joins the meeting. This is the ideal place for introductions or initial actions, such as greeting participants.
  • async def on_exit(self) -> None: Called when the agent is about to exit the meeting. Use this for cleanup tasks or for saying goodbye.
main.py
from videosdk.agents import Agent

class LifecycleAgent(Agent):
async def on_enter(self):
print("Agent has entered the meeting.")
await self.session.say("Hello everyone! I'm here to help.")

async def on_exit(self):
print("Agent is exiting the meeting.")
await self.session.say("It was a pleasure assisting you. Goodbye!")

Human in the Loop (HITL)

Human in the Loop enables AI agents to escalate specific queries to human operators for review and approval. This implementation uses Discord as the human interface through an MCP server, allowing seamless handoffs between AI automation and human oversight.

Use Cases

  • Discount Requests: AI escalates pricing queries to human sales agents
  • Complex Support: Technical issues requiring human expertise
  • Policy Decisions: Requests that need human approval or clarification
  • Escalation Scenarios: Situations where AI confidence is low

Implementation

The HITL pattern combines the Agent's MCP server capability with a Discord-based human interface:

main.py
from videosdk.agents import Agent, MCPServerStdio, CascadingPipeline, AgentSession, JobContext, RoomOptions, WorkerJob  
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.anthropic import AnthropicLLM
from videosdk.plugins.google import GoogleTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector
import pathlib
import sys
import os
from typing import Optional

class CustomerAgent(Agent):
def __init__(self, ctx: Optional[JobContext] = None):
current_dir = pathlib.Path(__file__).parent
discord_mcp_server_path = current_dir / "discord_mcp_server.py"

super().__init__(
instructions="You are a customer-facing agent for VideoSDK. You have access to various tools to assist with customer inquiries, provide support, and handle tasks. When a user asks for a discount percentage, always use the appropriate tool to retrieve and provide the accurate answer from your superior human agent.",
mcp_servers=[
MCPServerStdio(
executable_path=sys.executable,
process_arguments=[str(discord_mcp_server_path)],
session_timeout=30
),
]
)
self.ctx = ctx

async def on_enter(self) -> None:
"""Called when the agent first joins the meeting"""
await self.session.say("Hi! I'm your VideoSDK customer support agent. How can I help you today?")

async def on_exit(self) -> None:
"""Called when the agent exits the meeting"""
await self.session.say("Thank you for contacting VideoSDK support. Have a great day!")

# Pipeline configuration integrated into the main setup
def create_pipeline() -> CascadingPipeline:
"""Create and configure the cascading pipeline with all components"""
return CascadingPipeline(
stt=DeepgramSTT(api_key=os.getenv("DEEPGRAM_API_KEY")),
llm=AnthropicLLM(api_key=os.getenv("ANTHROPIC_API_KEY")),
tts=GoogleTTS(api_key=os.getenv("GOOGLE_API_KEY")),
vad=SileroVAD(),
turn_detector=TurnDetector(threshold=0.8)
)

async def start_session(ctx: JobContext):
"""Main entry point that creates agent, pipeline, and starts the session"""
# Create the pipeline
pipeline = create_pipeline()

# Create the agent with context
agent = CustomerAgent(ctx=ctx)

# Create the agent session
session = AgentSession(
agent=agent,
pipeline=pipeline
)

try:
# Connect to the room
await ctx.connect()

# Start the agent session
await session.start()

# Keep running until interrupted
import asyncio
await asyncio.Event().wait()

finally:
# Clean up resources
await session.close()
await ctx.shutdown()

def make_context() -> JobContext:
"""Create the job context with room configuration"""
room_options = RoomOptions(
room_id=os.getenv("VIDEOSDK_ROOM_ID", "your-room-id"),
auth_token=os.getenv("VIDEOSDK_AUTH_TOKEN"),
name="VideoSDK Customer Agent",
playground=True
)

return JobContext(room_options=room_options)

if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()

The Discord MCP server provides the ask_human tool that creates Discord threads for human operator responses. This leverages the same MCP integration pattern shown in the previous section.

Complete implementation with full source code, setup instructions, and configuration examples available in the VideoSDK Agents GitHub repository. -->

Examples - Try Out Yourself

Checkout the examples of function tool usage and MCP server.

Got a Question? Ask us on discord