MCP Integration

The Model Context Protocol (MCP) is an open standard that enables AI assistants to securely connect to data sources and tools. With VideoSDK's AI Agents, you can seamlessly integrate MCP servers to extend your agent's capabilities with external services or applications, databases, and APIs.

MCP Server Types

VideoSDK supports two transport methods for MCP servers:

1. STDIO Transport

Direct process communication
Local Python scripts
Best for custom tools and functions
Ideal for server-side integrations

2. HTTP Transport (Streamable HTTP or SSE)

Network-based communication
External MCP services
Best for third-party integrations
Supports remote MCP servers

How It Works with VideoSDK's AI Agent

MCP tools are automatically discovered and made available to your agent. Agent will intelligently choose which tools to use based on user requests. When a user asks for information that requires external data, the agent will:

Identify the need for external data based on the user's request
Select appropriate tools from available MCP servers
Execute the tools with relevant parameters
Process the results and provide a natural language response

This seamless integration allows your voice agent to access real-time data and external services while maintaining a natural conversational flow.

Creating an MCP Server

Basic MCP Server Structure

A simple MCP server using STDIO to return the current time. First, install the required package:

pip install fastmcp

mcp_stdio_example.py
from mcp.server.fastmcp import FastMCP
import datetime

# Create the MCP server
mcp = FastMCP("CurrentTimeServer")

@mcp.tool()
def get_current_time() -> str:
    """Get the current time in the user's location"""

    # Get current time
    now = datetime.datetime.now()
    
    # Return formatted time string
    return f"The current time is {now.strftime('%H:%M:%S')} on {now.strftime('%Y-%m-%d')}"

if __name__ == "__main__":
    # Run the server with STDIO transport
    mcp.run(transport="stdio")

Integrating MCP with VideoSDK Agent

Now we'll see how to integrate MCP servers with your VideoSDK AI Agent:

main.py
import asyncio
import pathlib
import sys
from videosdk.agents import Agent, AgentSession, RealTimePipeline,MCPServerStdio, MCPServerHTTP
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

class MyVoiceAgent(Agent):
    def __init__(self):
        # Define paths to your MCP servers
        mcp_script = Path(__file__).parent.parent / "MCP_Example" / "mcp_stdio_example.py"
        super().__init__(
            instructions="""You are a helpful assistant with access to real-time data. 
            You can provide current time information. 
            Always be conversational and helpful in your responses.""",
            mcp_servers=[
                # STDIO MCP Server (Local Python script for time)
                MCPServerStdio(
                    command=sys.executable,  # Use current Python interpreter
                    args=[str(mcp_script)],
                    client_session_timeout_seconds=30
                ),
                # HTTP MCP Server (External service example e.g Zapier)
                MCPServerHTTP(
                    url="https://your-mcp-service.com/api/mcp",
                    client_session_timeout_seconds=30
                )
            ]
        )

    async def on_enter(self) -> None:
        await self.session.say("Hi there! How can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Thank you for using the assistant. Goodbye!")

async def main(context: dict):
    
    # Configure Gemini Realtime model
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        config=GeminiLiveConfig(
            voice="Leda",  # Available voices: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr
            response_modalities=["AUDIO"]
        )
    )

    pipeline = RealTimePipeline(model=model)
    agent = MyVoiceAgent()

    session = AgentSession(
        agent=agent,
        pipeline=pipeline,
        context=context
    )

    try:
        # Start the session
        await session.start()
        # Keep the session running until manually terminated
        await asyncio.Event().wait()
    finally:
        # Clean up resources when done
        await session.close()

if __name__ == "__main__":
    def make_context():
        # When VIDEOSDK_AUTH_TOKEN is set in .env - DON'T include videosdk_auth
        return {
        "meetingId": "your_actual_meeting_id_here",  # Replace with actual meeting ID
        "name": "AI Voice Agent", 
        "videosdk_auth": "your_videosdk_auth_token_here"  # Replace with actual token
    }

tip

Get started quickly with the Quick Start Example for the VideoSDK AI Agent SDK With MCP — everything you need to build your first AI agent fast.

Got a Question? Ask us on discord

MCP Server Types​

1. STDIO Transport​

2. HTTP Transport (Streamable HTTP or SSE)​

How It Works with VideoSDK's AI Agent​

Creating an MCP Server​