Silero VAD
The Silero VAD (Voice Activity Detection) provider enables your agent to detect when users start and stop speaking. When added to a cascading pipeline, it automatically enables interrupt functionality - allowing users to interrupt the agent mid-response.
Installation​
Install the Silero VAD-enabled VideoSDK Agents package:
pip install "videosdk-plugins-silero"
Importing​
from videosdk.plugins.silero import SileroVAD
Example Usage​
from videosdk.plugins.silero import SileroVAD
from videosdk.agents import CascadingPipeline
# Initialize the Silero VAD
vad = SileroVAD(
input_sample_rate=48000,
model_sample_rate=16000,
threshold=0.3,
min_speech_duration=0.1,
min_silence_duration=0.75,
prefix_padding_duration=0.3
)
# Add VAD to cascading pipeline - automatically enables interrupts
pipeline = CascadingPipeline(vad=vad)
Configuration Options​
input_sample_rate
: (int) Sample rate of input audio in Hz (default:48000
)model_sample_rate
: (Literal[8000, 16000]) Model's expected sample rate (default:16000
)threshold
: (float) Voice activity detection sensitivity (0.0 to 1.0, default:0.3
)min_speech_duration
: (float) Minimum speech duration to trigger detection in seconds (default:0.1
)min_silence_duration
: (float) Minimum silence duration to end speech detection in seconds (default:0.75
)max_buffered_speech
: (float) Maximum speech buffer duration in seconds (default:60.0
)force_cpu
: (bool) Force CPU usage instead of GPU acceleration (default:True
)prefix_padding_duration
: (float) Audio padding before speech detection in seconds (default:0.3
)
Got a Question? Ask us on discord