Skip to main content

Cascading Pipeline

The Cascading Pipeline component provides a flexible, modular approach to building AI agents by allowing you to mix and match different components for Speech-to-Text (STT), Large Language Models (LLM), Text-to-Speech (TTS), Voice Activity Detection (VAD), and Turn Detection.

Key Features:​

  • Modular Component Selection - Choose different providers for each component
  • Flexible Configuration - Mix and match STT, LLM, TTS, VAD, and Turn Detection
  • Custom Processing - Add custom processing for STT and LLM outputs
  • Provider Agnostic - Support for multiple AI service providers
  • Advanced Control - Fine-tune each component independently

Example Implementation:​

from videosdk.agents import CascadingPipeline
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector

stt=DeepgramSTT(
api_key=os.getenv("DEEPGRAM_API_KEY"),
model="nova-2",
language="en"
)

llm=OpenAILLM(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)

tts=ElevenLabsTTS(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="your-voice-id"
)

vad=SileroVAD(
threshold = 0.35
)

turn_detector=TurnDetector(t
threshold=0.8
)

pipeline = CascadingPipeline(stt=stt, llm=llm, tts=tts, vad=vad, turn_detector=turn_detector)

Use Cases:​

  • Multi-language Support - Use specialized STT for different languages
  • Cost Optimization - Mix premium and cost-effective services
  • Custom Voice Processing - Add domain-specific processing logic
  • Performance Optimization - Choose fastest providers for each component
  • Compliance Requirements - Use specific providers for regulatory compliance

Got a Question? Ask us on discord


Was this helpful?