Skip to main content
Version: 1.0.x

Voice Activity Detection (VAD)

First, configure VAD to detect the presence of speech. This helps manage interruptions and acts as a first-pass filter.

from videosdk.agents.plugins import SileroVAD

# Configure VAD to detect speech activity
vad = SileroVAD(
threshold=0.5, # Sensitivity to speech (0.3-0.8)
min_speech_duration=0.1, # Ignore very brief sounds
min_silence_duration=0.75 # Wait time before considering speech ended
)

Interruption Detection (VAD + STT)

Interruption Detection controls when the system should treat user speech as an intentional interruption. It evaluates both voice activity and recognized speech content to avoid triggering interruptions from short noises, filler words, or background audio. The agent only stops or responds when the user clearly intends to speak.

Configuration Example (HYBRID mode)

pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
mode="HYBRID",
interrupt_min_duration=0.2, # 200ms of continuous speech
interrupt_min_words=2, # At least 2 words recognized
)
)

VAD_ONLY mode

pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
mode="VAD_ONLY",
interrupt_min_duration=0.2, # 200ms of continuous speech
)
)

STT_ONLY mode

pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
mode="STT_ONLY",
interrupt_min_words=2, # At least 2 words recognized
)
)

Configuration Parameters

ParameterTypeDescription
modestrHYBRID : Combines VAD and STT. Requires both audio detection and recognized words to trigger an interruption.
VAD_ONLY : Uses only raw speech activity detection. Faster but may be triggered by background noise.
STT_ONLY : Relies only on recognized words from the transcript. Slower but ensures speech is intelligible.
interrupt_min_durationfloatMinimum duration (in seconds) of continuous speech required to trigger interruption.
interrupt_min_wordsintMinimum number of words that must be recognized (used in HYBRID and STT_ONLY modes).

False-Interruption Recovery

The False-Interruption Recovery feature detects accidental or brief user noises and allows the agent to automatically resume speaking when interruptions are not genuine.

Configuration Example

pipeline = Pipeline(
# ... other config
interrupt_config=InterruptConfig(
false_interrupt_pause_duration=2.0, # Wait 2 seconds to confirm interruption
resume_on_false_interrupt=True, # Auto-resume if interruption is brief
)
)

Configuration Parameters

ParameterTypeDescription
false_interrupt_pause_durationfloatDuration (in seconds) to wait after detecting an interruption before considering it false. If the user doesn't continue speaking within this time, the interruption is considered accidental and the agent resumes.
resume_on_false_interruptboolIf True, the agent will automatically resume speaking after detecting a false interruption. If False, the agent will remain paused even after brief interruptions.

Got a Question? Ask us on discord