Cascading Pipeline
The Cascading Pipeline
component provides a flexible, modular approach to building AI agents by allowing you to mix and match different components for Speech-to-Text (STT), Large Language Models (LLM), Text-to-Speech (TTS), Voice Activity Detection (VAD), and Turn Detection.
Key Features:​
- Modular Component Selection - Choose different providers for each component
- Flexible Configuration - Mix and match STT, LLM, TTS, VAD, and Turn Detection
- Custom Processing - Add custom processing for STT and LLM outputs
- Provider Agnostic - Support for multiple AI service providers
- Advanced Control - Fine-tune each component independently
Example Implementation:​
from videosdk.agents import CascadingPipeline
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector
stt=DeepgramSTT(
api_key=os.getenv("DEEPGRAM_API_KEY"),
model="nova-2",
language="en"
)
llm=OpenAILLM(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)
tts=ElevenLabsTTS(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="your-voice-id"
)
vad=SileroVAD(
threshold = 0.35
)
turn_detector=TurnDetector(t
threshold=0.8
)
pipeline = CascadingPipeline(stt=stt, llm=llm, tts=tts, vad=vad, turn_detector=turn_detector)
Use Cases:​
- Multi-language Support - Use specialized STT for different languages
- Cost Optimization - Mix premium and cost-effective services
- Custom Voice Processing - Add domain-specific processing logic
- Performance Optimization - Choose fastest providers for each component
- Compliance Requirements - Use specific providers for regulatory compliance
Got a Question? Ask us on discord