Skip to main content
Version: 1.0.x

De-noise

De-noise improves audio quality in your AI agent conversations by filtering out background noise. This creates more professional and engaging interactions, especially in noisy environments.

Overview

The VideoSDK Agents framework provides real-time audio denoising capabilities via RNNoise plugin that:

  • Remove Background Noise: Filters out ambient sounds, keyboard typing, air conditioning, and other distractions
  • Enhance Voice Clarity: Improves speech intelligibility and quality
  • Work in Real-time: Processes audio with minimal latency during live conversations
  • Integrate Seamlessly: Works with Pipeline in both cascading and realtime modes

What De-noise Solves

Without noise removal, your agents may struggle with:

  • Poor audio quality affecting transcription accuracy
  • Background noise interfering with conversation flow
  • Unprofessional sound quality in business applications
  • Difficulty understanding users in noisy environments

With De-noise, you get:

  • Crystal clear audio for better user experience
  • Improved speech-to-text accuracy
  • Professional-grade audio quality
  • Better performance in various acoustic environments

RNNoise Implementation

RNNoise is a real-time noise suppression library that uses deep learning to distinguish between speech and noise, providing effective background noise removal.

Key Features

  • Real-time Processing: Low-latency noise removal suitable for live conversations
  • Adaptive Filtering: Automatically adjusts to different types of background noise
  • Speech Preservation: Maintains voice quality while removing unwanted sounds
  • Lightweight: Efficient processing with minimal computational overhead

Basic Setup

from videosdk.agents.plugins import RNNoise  

# Initialize noise removal
denoise = RNNoise()

Pipeline Integration

Add noise removal to your cascade:

main.py
from videosdk.agents import Agent, Pipeline, AgentSession
from videosdk.agents.plugins import RNNoise
# Add your preferred providers
from videosdk.agents.plugins import DeepgramSTT, OpenAILLM, ElevenLabsTTS, SileroVAD

class EnhancedVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a professional assistant with crystal-clear audio quality. Help users with their questions while maintaining excellent conversation flow."
)

async def on_enter(self):
await self.session.say("Hello! I'm here with enhanced audio quality for our conversation.")

async def on_exit(self):
await self.session.say("Goodbye! It was great talking with you.")

# Set up pipeline with noise removal
pipeline = Pipeline(
stt=DeepgramSTT(api_key="your-deepgram-key"),
llm=OpenAILLM(api_key="your-openai-key", model="gpt-4"),
tts=ElevenLabsTTS(api_key="your-elevenlabs-key", voice_id="your-voice-id"),
vad=SileroVAD(),
denoise=RNNoise() # Enable noise removal
)

# Create and start session
async def main():
session = AgentSession(agent=EnhancedVoiceAgent(), pipeline=pipeline)
await session.start()

if __name__ == "__main__":
import asyncio
asyncio.run(main())

VideoSDK Inference De-noise

In addition to the local RNNoise plugin, you can run server-side noise cancellation through the VideoSDK Inference Gateway. The heavy lifting (model loading, inference, resampling) happens on the gateway, so you don't need to bundle any model locally or provide a provider API key, authentication is handled with your VIDEOSDK_AUTH_TOKEN.

The Inference Denoise class connects over a WebSocket and supports AI-Coustics and Sanas noise cancellation through a single interface.

Setup Authentication

VIDEOSDK_AUTH_TOKEN="your-videosdk-auth-token"

Importing

from videosdk.agents.inference import AICousticsDenoise, SanasDenoise

Configuration

AICousticsDenoise()

  • model_id: (str) AI-Coustics model ID (default: "sparrow-xxs-48khz").
  • sample_rate: (int) Audio sample rate in Hz. Use 48000 for Sparrow models and 16000 for Quail models (default: 48000).
  • channels: (int) Number of audio channels (default: 1 for mono).

SanasDenoise()

  • model_id: (str) Sanas model ID (default: "VI_G_NC3.0").
  • sample_rate: (int) Audio sample rate in Hz (default: 16000).
  • channels: (int) Number of audio channels (default: 1 for mono).

Supported Models

The following denoise models are available through the VideoSDK Inference Gateway. Click the copy icon next to any Model ID to copy it.

ProviderModel NameModel ID
SanasSE2.1SE2.1
SanasVI_G_NC3.0VI_G_NC3.0
KrispKrisp VIVA Tel v2krisp-viva-tel-v2
AiCousticsQuail-VF-L (16kHz)quail-vf-2.1-l-16khz
AiCousticsQuail-VF-S (16kHz)quail-vf-2.1-s-16khz
AiCousticsRook Large (48kHz)rook-l-48khz
AiCousticsRook Small (48kHz)rook-s-48khz
AiCousticsRook Large (8kHz)rook-l-8khz
AiCousticsRook Small (8kHz)rook-s-8khz

Inference Pipeline Integration

main.py
from videosdk.agents import Agent, Pipeline, AgentSession
from videosdk.agents.plugins import DeepgramSTT, OpenAILLM, ElevenLabsTTS, SileroVAD
from videosdk.agents.inference import AICousticsDenoise

pipeline = Pipeline(
stt=DeepgramSTT(sample_rate=48000),
llm=OpenAILLM(),
tts=ElevenLabsTTS(),
vad=SileroVAD(input_sample_rate=48000),
# Server-side noise cancellation via VideoSDK Inference Gateway
denoise=AICousticsDenoise(model_id="sparrow-xxs-48khz"),
)

Audio Processing Flow

When noise removal is enabled, audio processing follows this flow:

  1. Raw Audio Input: Microphone captures audio with background noise
  2. Noise Removal: RNNoise filters out unwanted sounds
  3. Enhanced Audio: Clean audio is passed to speech processing
  4. Improved Results: Better transcription and conversation quality

Example - Try Out Yourself

Got a Question? Ask us on discord