Deployments

Overview

The VideoSDK Agents framework provides multiple deployment options to run your AI agents in production environments. Understanding these options helps you choose the right deployment strategy for your specific use case.

VideoSDK Agents supports two primary deployment modes:

Agent Cloud (Managed) - Fully managed deployment hosted on VideoSDK infrastructure
Self-Hosting - Self-managed deployment on your own infrastructure (EC2, Docker, Kubernetes, etc.)

Agent Cloud (Hosted on Our Infrastructure)

Agent Cloud is a fully managed service that handles the deployment, scaling, and maintenance of your AI agents. When you deploy to Agent Cloud:

Zero Infrastructure Management: No need to manage servers, containers, or scaling
Automatic Scaling: Built-in load balancing and auto-scaling capabilities
High Availability: Redundant infrastructure with automatic failover
Managed Updates: Automatic security patches and framework updates
Global Distribution: Agents deployed across multiple regions for low latency
Built-in Monitoring: Integrated metrics, logging, and health monitoring

Best for: Teams that want to focus on agent development rather than infrastructure management, or applications with variable traffic patterns.

Self-Hosting (EC2, Docker, or Custom Infrastructure)

Self-hosting gives you complete control over your deployment environment and infrastructure. When self-hosting:

Full Control: Complete control over hardware, networking, and configuration
Custom Integrations: Ability to integrate with existing infrastructure and tools
Cost Optimization: Potential cost savings for high-volume, predictable workloads
Compliance: Meet specific security, compliance, or data residency requirements
Custom Scaling: Implement your own scaling strategies and resource management

Best for: Organizations with existing infrastructure, specific compliance requirements, or predictable high-volume workloads.

When to Choose Agent Cloud vs Self-Hosting

Choose Agent Cloud when:

You want to get started quickly without infrastructure setup
You have variable or unpredictable traffic patterns
You need global distribution and low latency
You want automatic scaling and high availability
You prefer a managed service with built-in monitoring

Choose Self-Hosting when:

You need to meet specific compliance or security requirements
You have predictable, high-volume workloads where cost optimization is important
You require custom integrations with existing systems
You need complete control over the deployment environment

Common Terminology

Understanding these key terms will help you navigate the deployment documentation:

Term	Definition
Agent	Your AI application built using the VideoSDK Agents framework. An agent can handle voice conversations, process audio, and respond with synthesized speech.
Worker	A runtime component that executes your agent code. Workers can run in different environments (Agent Cloud or self-hosted) and handle job assignments from the backend registry system.
Backend Registry	The central service that manages worker registration, job assignment, and load balancing. Workers connect to this registry to receive job assignments and report their status.
Job	A single execution instance of your agent. When a user starts a conversation, the backend registry assigns a job to an available worker.
JobContext	The execution context for a job, containing room configuration, pipeline setup, and session management. This is the main interface your agent code interacts with.
Worker Registration	The process by which self-hosted workers register themselves with the VideoSDK backend registry to receive job assignments.
Load Threshold	A configuration parameter that determines when a worker is considered "at capacity" and should not receive new job assignments.
Health Check	Regular monitoring of worker status to ensure they're available and functioning correctly. Workers provide health endpoints for monitoring.
Resource Management	The system for managing worker resources including process/thread allocation, memory limits, and concurrent job handling.
Session Management	Handles the lifecycle of agent sessions including automatic session ending, timeouts, and cleanup.
Horizontal Scaling	The manual process of deploying additional worker instances to handle increased load (requires manual deployment of new worker instances).
Vertical Scaling	The automatic scaling within a single worker up to its configured maximum capacity (`max_processes`).
Dispatch API	A REST API endpoint that allows you to dynamically dispatch agents to meetings on-demand.
AI Deployment	The deployment configuration that runs your AI agent, either in Agent Cloud or self-hosted environments.

This terminology will be referenced throughout the deployment documentation as we explore specific deployment scenarios and configurations.

Got a Question? Ask us on discord

Overview​

Agent Cloud (Hosted on Our Infrastructure)​

Self-Hosting (EC2, Docker, or Custom Infrastructure)​

When to Choose Agent Cloud vs Self-Hosting​

Choose Agent Cloud when:​

Choose Self-Hosting when:​

Common Terminology​