Skip to main content

Silero VAD

The Silero VAD (Voice Activity Detection) provider enables your agent to detect when users start and stop speaking. When added to a cascading pipeline, it automatically enables interrupt functionality - allowing users to interrupt the agent mid-response.

Installation​

Install the Silero VAD-enabled VideoSDK Agents package:

pip install "videosdk-plugins-silero"

Importing​

from videosdk.plugins.silero import SileroVAD

Example Usage​

from videosdk.plugins.silero import SileroVAD
from videosdk.agents import CascadingPipeline

# Initialize the Silero VAD
vad = SileroVAD(
input_sample_rate=48000,
model_sample_rate=16000,
threshold=0.3,
min_speech_duration=0.1,
min_silence_duration=0.75,
prefix_padding_duration=0.3
)

# Add VAD to cascading pipeline - automatically enables interrupts
pipeline = CascadingPipeline(vad=vad)

Configuration Options​

  • input_sample_rate: (int) Sample rate of input audio in Hz (default: 48000)
  • model_sample_rate: (Literal[8000, 16000]) Model's expected sample rate (default: 16000)
  • threshold: (float) Voice activity detection sensitivity (0.0 to 1.0, default: 0.3)
  • min_speech_duration: (float) Minimum speech duration to trigger detection in seconds (default: 0.1)
  • min_silence_duration: (float) Minimum silence duration to end speech detection in seconds (default: 0.75)
  • max_buffered_speech: (float) Maximum speech buffer duration in seconds (default: 60.0)
  • force_cpu: (bool) Force CPU usage instead of GPU acceleration (default: True)
  • prefix_padding_duration: (float) Audio padding before speech detection in seconds (default: 0.3)

Got a Question? Ask us on discord