AI Voice Agents
The VideoSDK AI Agent SDK is a powerful Python framework for developers to seamlessly integrate intelligent, real-time voice agents into any application. Bridge the gap between advanced AI models and human interaction, creating natural, engaging, and responsive conversational experiences.
AI Voice Agent Quickstart
Build an AI Voice Agent in less than 10 minutes
AI Telephony Agent Quickstart
Build an AI Telephony Agent in less than 10 minutes
Github Repository
The videosdk agent code and examples
SDK Reference
Reference docs for agents framework
The Architecture
The VideoSDK AI Agents framework connects four key components to enable seamless AI voice interactions:
- Your Infrastructure hosts the agent management system
- The Agent Worker creates and manages AI sessions
- The VideoSDK Room handles real-time meeting operations
- User Devices connect through web, mobile apps, or phone calls to interact with intelligent agents that can listen, understand, and respond naturally in real-time conversations.
Use Cases
Here are some real-world applications where VideoSDK AI Agents can be deployed to create intelligent, voice-enabled experiences across different industries and scenarios. You can use this, or refer this to create your customized agent.
Multi-Agent System
See the Agent-to-Agent Protocol in action, where there is a general customer care agent that transfers queries to Loan Specialist Agent for Loan related queries.
Agent with MCP Server
Integrate MCP (Model Context Protocol) Server with VideoSDK Agents.
RAG Agent
An example of Retrieval-Augmented Generation (RAG) based agent for knowledge-grounded conversations.
Translator Agent
A translator assistant that can speak to user in their language.
AI Telephony Agent
Build a fully functional AI Telephony Agent for Inbound and Outbound calls.
Whatsapp Call Agent
Streamline your business with whatsapp ai call agent that you can use for inbound queries or for outbound calls.
Agent with Wakeup Call
An agent that maintains engagement by triggering automatically after specified period of inactivity.
Virtual Avatar Agent
Bring your AI agent to life with a virtual avatar that can interact visually during conversations.
The Building Blocks
Our SDK is built on four primary, modular components that work together to create powerful and customizable agents. Understand these concepts, and you're ready to build.
Agent Capabilities
Build sophisticated agents with function tools, vision, human-in-the-loop, and agent-to-agent(A2A) communication.
Deployment Options
Deploy your agent on cloud or self-host it on your own infrastructure
Observability
Monitor and debug with confidence using our built-in session analytics, latency tracking, and detailed traces.
Plugin Ecosystem
Integrate with dozens of providers like OpenAI, Google, Anthropic, and Elevenlabs for STT, LLM, and TTS.
Need Help?
If you have any queries, please feel free to reach out to us using one of the following methods:
Discord
Join our Discord Community.
GitHub
Ask your questions on GitHub.
Support
Talk to an expert, book demo or talk to sales.
Frequently Asked Questions
What programming language and version are required?
The AI Agent SDK is built in Python. You'll need Python 3.12 or higher to use the SDK.
Can my agent answer phone calls?
Yes. By integrating with our SIP/telephony services, your AI agent can join a room initiated by a standard phone call. This allows you to build powerful IVR systems, automated appointment schedulers, AI-powered call centers, and more.
What AI models are supported?
The SDK supports various AI models including:
- Real-time Models: OpenAI, Google Gemini, AWS Nova Sonic
- LLM Providers: OpenAI, Google Gemini, Anthropic Claude, Sarvam AI, Cerebras
- TTS Providers: ElevenLabs, OpenAI, Google, AWS Polly, Cartesia, and many more
- STT Providers: OpenAI Whisper, Deepgram, Google, AssemblyAI, and others
Can I use my own custom models?
Absolutely! The SDK's modular architecture allows you to create custom plugins for any AI provider. Check our plugin development guide for detailed instructions.
How is pricing handled for the AI Agent SDK?
VideoSDK offers a free tier with limited usage. The AI Agent SDK itself is open-source, but you'll need API keys for the AI services you choose to use (OpenAI, Google, etc.). Check the pricing page for VideoSDK usage limits.
Can agents handle more than just voice?
Absolutely! Agents support multimodal interactions including vision processing, data messages, and real-time video streams. They can also use function tools to interact with external systems and APIs.
Is the SDK production-ready?
Yes, the AI Agent SDK is stable and production-ready. It is designed to be self-hosted on your own infrastructure for full control and scalability, from a single server to a Kubernetes cluster. It includes comprehensive error handling, metrics collection, and deployment flexibility.
Got a Question? Ask us on discord