Skip to main content

AI Voice Agents

The VideoSDK AI Agent SDK is a powerful Python framework for developers to seamlessly integrate intelligent, real-time voice agents into any application. Bridge the gap between advanced AI models and human interaction, creating natural, engaging, and responsive conversational experiences.

The Architecture

The VideoSDK AI Agents framework connects four key components to enable seamless AI voice interactions:

  • Your Infrastructure hosts the agent management system
  • The Agent Worker creates and manages AI sessions
  • The VideoSDK Room handles real-time meeting operations
  • User Devices connect through web, mobile apps, or phone calls to interact with intelligent agents that can listen, understand, and respond naturally in real-time conversations.

Introduction

Use Cases

Here are some real-world applications where VideoSDK AI Agents can be deployed to create intelligent, voice-enabled experiences across different industries and scenarios. You can use this, or refer this to create your customized agent.

The Building Blocks

Our SDK is built on four primary, modular components that work together to create powerful and customizable agents. Understand these concepts, and you're ready to build.

Need Help?

If you have any queries, please feel free to reach out to us using one of the following methods:

Frequently Asked Questions

What programming language and version are required?

The AI Agent SDK is built in Python. You'll need Python 3.12 or higher to use the SDK.

Can my agent answer phone calls?

Yes. By integrating with our SIP/telephony services, your AI agent can join a room initiated by a standard phone call. This allows you to build powerful IVR systems, automated appointment schedulers, AI-powered call centers, and more.

What AI models are supported?

The SDK supports various AI models including:

  • Real-time Models: OpenAI, Google Gemini, AWS Nova Sonic
  • LLM Providers: OpenAI, Google Gemini, Anthropic Claude, Sarvam AI, Cerebras
  • TTS Providers: ElevenLabs, OpenAI, Google, AWS Polly, Cartesia, and many more
  • STT Providers: OpenAI Whisper, Deepgram, Google, AssemblyAI, and others
Can I use my own custom models?

Absolutely! The SDK's modular architecture allows you to create custom plugins for any AI provider. Check our plugin development guide for detailed instructions.

How is pricing handled for the AI Agent SDK?

VideoSDK offers a free tier with limited usage. The AI Agent SDK itself is open-source, but you'll need API keys for the AI services you choose to use (OpenAI, Google, etc.). Check the pricing page for VideoSDK usage limits.

Can agents handle more than just voice?

Absolutely! Agents support multimodal interactions including vision processing, data messages, and real-time video streams. They can also use function tools to interact with external systems and APIs.

Is the SDK production-ready?

Yes, the AI Agent SDK is stable and production-ready. It is designed to be self-hosted on your own infrastructure for full control and scalability, from a single server to a Kubernetes cluster. It includes comprehensive error handling, metrics collection, and deployment flexibility.

Got a Question? Ask us on discord