Voice Activity Detection (VAD)
First, configure VAD to detect the presence of speech. This helps manage interruptions and acts as a first-pass filter.
from videosdk.agents.plugins import SileroVAD
# Configure VAD to detect speech activity
vad = SileroVAD(
threshold=0.5, # Sensitivity to speech (0.3-0.8)
min_speech_duration=0.1, # Ignore very brief sounds
min_silence_duration=0.75 # Wait time before considering speech ended
)
Interruption Detection (VAD + STT)
Interruption Detection controls when the system should treat user speech as an intentional interruption. It evaluates both voice activity and recognized speech content to avoid triggering interruptions from short noises, filler words, or background audio. The agent only stops or responds when the user clearly intends to speak.
Configuration Example (HYBRID mode)
pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
mode="HYBRID",
interrupt_min_duration=0.2, # 200ms of continuous speech
interrupt_min_words=2, # At least 2 words recognized
)
)
VAD_ONLY mode
pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
mode="VAD_ONLY",
interrupt_min_duration=0.2, # 200ms of continuous speech
)
)
STT_ONLY mode
pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
mode="STT_ONLY",
interrupt_min_words=2, # At least 2 words recognized
)
)
Configuration Parameters
| Parameter | Type | Description |
|---|---|---|
mode | str | • HYBRID : Combines VAD and STT. Requires both audio detection and recognized words to trigger an interruption. • VAD_ONLY : Uses only raw speech activity detection. Faster but may be triggered by background noise. • STT_ONLY : Relies only on recognized words from the transcript. Slower but ensures speech is intelligible. |
interrupt_min_duration | float | Minimum duration (in seconds) of continuous speech required to trigger interruption. |
interrupt_min_words | int | Minimum number of words that must be recognized (used in HYBRID and STT_ONLY modes). |
False-Interruption Recovery
The False-Interruption Recovery feature detects accidental or brief user noises and allows the agent to automatically resume speaking when interruptions are not genuine.
Configuration Example
pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
false_interrupt_pause_duration=2.0, # Wait 2 seconds to confirm interruption
resume_on_false_interrupt=True, # Auto-resume if interruption is brief
)
)
Configuration Parameters
| Parameter | Type | Description |
|---|---|---|
false_interrupt_pause_duration | float | Duration (in seconds) to wait after detecting an interruption before considering it false. If the user doesn't continue speaking within this time, the interruption is considered accidental and the agent resumes. |
resume_on_false_interrupt | bool | If True, the agent will automatically resume speaking after detecting a false interruption. If False, the agent will remain paused even after brief interruptions. |
Got a Question? Ask us on discord

