Deployments
Overview
The VideoSDK Agents framework provides multiple deployment options to run your AI agents in production environments. Understanding these options helps you choose the right deployment strategy for your specific use case.
VideoSDK Agents supports two primary deployment modes:
- Agent Cloud (Managed) - Fully managed deployment hosted on VideoSDK infrastructure
- Self-Hosting - Self-managed deployment on your own infrastructure (EC2, Docker, Kubernetes, etc.)
Agent Cloud (Hosted on Our Infrastructure)
Agent Cloud is a fully managed service that handles the deployment, scaling, and maintenance of your AI agents. When you deploy to Agent Cloud:
- Zero Infrastructure Management: No need to manage servers, containers, or scaling
- Automatic Scaling: Built-in load balancing and auto-scaling capabilities
- High Availability: Redundant infrastructure with automatic failover
- Managed Updates: Automatic security patches and framework updates
- Global Distribution: Agents deployed across multiple regions for low latency
- Built-in Monitoring: Integrated metrics, logging, and health monitoring
Best for: Teams that want to focus on agent development rather than infrastructure management, or applications with variable traffic patterns.
Self-Hosting (EC2, Docker, or Custom Infrastructure)
Self-hosting gives you complete control over your deployment environment and infrastructure. When self-hosting:
- Full Control: Complete control over hardware, networking, and configuration
- Custom Integrations: Ability to integrate with existing infrastructure and tools
- Cost Optimization: Potential cost savings for high-volume, predictable workloads
- Compliance: Meet specific security, compliance, or data residency requirements
- Custom Scaling: Implement your own scaling strategies and resource management
Best for: Organizations with existing infrastructure, specific compliance requirements, or predictable high-volume workloads.
When to Choose Agent Cloud vs Self-Hosting
Choose Agent Cloud when:
- You want to get started quickly without infrastructure setup
- You have variable or unpredictable traffic patterns
- You need global distribution and low latency
- You want automatic scaling and high availability
- You prefer a managed service with built-in monitoring
Choose Self-Hosting when:
- You need to meet specific compliance or security requirements
- You have predictable, high-volume workloads where cost optimization is important
- You require custom integrations with existing systems
- You need complete control over the deployment environment
Common Terminology
Understanding these key terms will help you navigate the deployment documentation:
Term | Definition |
---|---|
Agent | Your AI application built using the VideoSDK Agents framework. An agent can handle voice conversations, process audio, and respond with synthesized speech. |
Worker | A runtime component that executes your agent code. Workers can run in different environments (Agent Cloud or self-hosted) and handle job assignments from the backend registry system. |
Backend Registry | The central service that manages worker registration, job assignment, and load balancing. Workers connect to this registry to receive job assignments and report their status. |
Job | A single execution instance of your agent. When a user starts a conversation, the backend registry assigns a job to an available worker. |
JobContext | The execution context for a job, containing room configuration, pipeline setup, and session management. This is the main interface your agent code interacts with. |
Worker Registration | The process by which self-hosted workers register themselves with the VideoSDK backend registry to receive job assignments. |
Load Threshold | A configuration parameter that determines when a worker is considered "at capacity" and should not receive new job assignments. |
Health Check | Regular monitoring of worker status to ensure they're available and functioning correctly. Workers provide health endpoints for monitoring. |
Resource Management | The system for managing worker resources including process/thread allocation, memory limits, and concurrent job handling. |
Session Management | Handles the lifecycle of agent sessions including automatic session ending, timeouts, and cleanup. |
Horizontal Scaling | The manual process of deploying additional worker instances to handle increased load (requires manual deployment of new worker instances). |
Vertical Scaling | The automatic scaling within a single worker up to its configured maximum capacity (max_processes ). |
Dispatch API | A REST API endpoint that allows you to dynamically dispatch agents to meetings on-demand. |
AI Deployment | The deployment configuration that runs your AI agent, either in Agent Cloud or self-hosted environments. |
This terminology will be referenced throughout the deployment documentation as we explore specific deployment scenarios and configurations.
Got a Question? Ask us on discord