De-noise
De-noise improves audio quality in your AI agent conversations by filtering out background noise. This creates more professional and engaging interactions, especially in noisy environments.
Overview
The VideoSDK Agents framework provides real-time audio denoising capabilities via RNNoise plugin that:
- Remove Background Noise: Filters out ambient sounds, keyboard typing, air conditioning, and other distractions
- Enhance Voice Clarity: Improves speech intelligibility and quality
- Work in Real-time: Processes audio with minimal latency during live conversations
- Integrate Seamlessly: Works with
Pipelinein both cascading and realtime modes
What De-noise Solves
Without noise removal, your agents may struggle with:
- Poor audio quality affecting transcription accuracy
- Background noise interfering with conversation flow
- Unprofessional sound quality in business applications
- Difficulty understanding users in noisy environments
With De-noise, you get:
- Crystal clear audio for better user experience
- Improved speech-to-text accuracy
- Professional-grade audio quality
- Better performance in various acoustic environments
RNNoise Implementation
RNNoise is a real-time noise suppression library that uses deep learning to distinguish between speech and noise, providing effective background noise removal.
Key Features
- Real-time Processing: Low-latency noise removal suitable for live conversations
- Adaptive Filtering: Automatically adjusts to different types of background noise
- Speech Preservation: Maintains voice quality while removing unwanted sounds
- Lightweight: Efficient processing with minimal computational overhead
Basic Setup
from videosdk.agents.plugins import RNNoise
# Initialize noise removal
denoise = RNNoise()
Pipeline Integration
- Cascade
- Real-time Pipeline
Add noise removal to your cascade:
from videosdk.agents import Agent, Pipeline, AgentSession
from videosdk.agents.plugins import RNNoise
# Add your preferred providers
from videosdk.agents.plugins import DeepgramSTT, OpenAILLM, ElevenLabsTTS, SileroVAD
class EnhancedVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a professional assistant with crystal-clear audio quality. Help users with their questions while maintaining excellent conversation flow."
)
async def on_enter(self):
await self.session.say("Hello! I'm here with enhanced audio quality for our conversation.")
async def on_exit(self):
await self.session.say("Goodbye! It was great talking with you.")
# Set up pipeline with noise removal
pipeline = Pipeline(
stt=DeepgramSTT(api_key="your-deepgram-key"),
llm=OpenAILLM(api_key="your-openai-key", model="gpt-4"),
tts=ElevenLabsTTS(api_key="your-elevenlabs-key", voice_id="your-voice-id"),
vad=SileroVAD(),
denoise=RNNoise() # Enable noise removal
)
# Create and start session
async def main():
session = AgentSession(agent=EnhancedVoiceAgent(), pipeline=pipeline)
await session.start()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Integrate with real-time models:
from videosdk.agents import Agent, Pipeline, AgentSession
from videosdk.agents.plugins import RNNoise
from videosdk.agents.plugins import OpenAIRealtime
class EnhancedRealtimeAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a professional assistant with crystal-clear audio quality. Engage in natural, real-time conversations while providing helpful responses."
)
async def on_enter(self):
await self.session.say("Hello! I'm ready for a real-time conversation with enhanced audio quality.")
async def on_exit(self):
await self.session.say("Thank you for the conversation! Take care.")
# Set up real-time model
model = OpenAIRealtime(
model="gpt-4o-realtime-preview",
api_key="your-openai-key",
voice="alloy" # Choose from: alloy, echo, fable, onyx, nova, shimmer
)
# Set up pipeline with noise removal
pipeline = Pipeline(
llm=model,
denoise=RNNoise() # Enable noise removal
)
# Create and start session
async def main():
session = AgentSession(agent=EnhancedRealtimeAgent(), pipeline=pipeline)
await session.start()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
VideoSDK Inference De-noise
In addition to the local RNNoise plugin, you can run server-side noise cancellation through the VideoSDK Inference Gateway. The heavy lifting (model loading, inference, resampling) happens on the gateway, so you don't need to bundle any model locally or provide a provider API key, authentication is handled with your VIDEOSDK_AUTH_TOKEN.
The Inference Denoise class connects over a WebSocket and supports AI-Coustics and Sanas noise cancellation through a single interface.
Setup Authentication
VIDEOSDK_AUTH_TOKEN="your-videosdk-auth-token"
Importing
from videosdk.agents.inference import AICousticsDenoise, SanasDenoise
Configuration
AICousticsDenoise()
model_id: (str) AI-Coustics model ID (default:"sparrow-xxs-48khz").sample_rate: (int) Audio sample rate in Hz. Use48000for Sparrow models and16000for Quail models (default:48000).channels: (int) Number of audio channels (default:1for mono).
SanasDenoise()
model_id: (str) Sanas model ID (default:"VI_G_NC3.0").sample_rate: (int) Audio sample rate in Hz (default:16000).channels: (int) Number of audio channels (default:1for mono).
Supported Models
The following denoise models are available through the VideoSDK Inference Gateway. Click the copy icon next to any Model ID to copy it.
| Provider | Model Name | Model ID |
|---|---|---|
| Sanas | SE2.1 | SE2.1 |
| Sanas | VI_G_NC3.0 | VI_G_NC3.0 |
| Krisp | Krisp VIVA Tel v2 | krisp-viva-tel-v2 |
| AiCoustics | Quail-VF-L (16kHz) | quail-vf-2.1-l-16khz |
| AiCoustics | Quail-VF-S (16kHz) | quail-vf-2.1-s-16khz |
| AiCoustics | Rook Large (48kHz) | rook-l-48khz |
| AiCoustics | Rook Small (48kHz) | rook-s-48khz |
| AiCoustics | Rook Large (8kHz) | rook-l-8khz |
| AiCoustics | Rook Small (8kHz) | rook-s-8khz |
Inference Pipeline Integration
from videosdk.agents import Agent, Pipeline, AgentSession
from videosdk.agents.plugins import DeepgramSTT, OpenAILLM, ElevenLabsTTS, SileroVAD
from videosdk.agents.inference import AICousticsDenoise
pipeline = Pipeline(
stt=DeepgramSTT(sample_rate=48000),
llm=OpenAILLM(),
tts=ElevenLabsTTS(),
vad=SileroVAD(input_sample_rate=48000),
# Server-side noise cancellation via VideoSDK Inference Gateway
denoise=AICousticsDenoise(model_id="sparrow-xxs-48khz"),
)
Audio Processing Flow
When noise removal is enabled, audio processing follows this flow:
- Raw Audio Input: Microphone captures audio with background noise
- Noise Removal:
RNNoisefilters out unwanted sounds - Enhanced Audio: Clean audio is passed to speech processing
- Improved Results: Better transcription and conversation quality
Example - Try Out Yourself
Got a Question? Ask us on discord

