De-noise

De-noise improves audio quality in your AI agent conversations by filtering out background noise. This creates more professional and engaging interactions, especially in noisy environments.

Overview

The VideoSDK Agents framework provides real-time audio denoising capabilities via RNNoise plugin that:

Remove Background Noise: Filters out ambient sounds, keyboard typing, air conditioning, and other distractions
Enhance Voice Clarity: Improves speech intelligibility and quality
Work in Real-time: Processes audio with minimal latency during live conversations
Integrate Seamlessly: Works with both CascadingPipeline and RealTimePipeline architectures

What De-noise Solves

Without noise removal, your agents may struggle with:

Poor audio quality affecting transcription accuracy
Background noise interfering with conversation flow
Unprofessional sound quality in business applications
Difficulty understanding users in noisy environments

With De-noise, you get:

Crystal clear audio for better user experience
Improved speech-to-text accuracy
Professional-grade audio quality
Better performance in various acoustic environments

RNNoise Implementation

RNNoise is a real-time noise suppression library that uses deep learning to distinguish between speech and noise, providing effective background noise removal.

Key Features

Real-time Processing: Low-latency noise removal suitable for live conversations
Adaptive Filtering: Automatically adjusts to different types of background noise
Speech Preservation: Maintains voice quality while removing unwanted sounds
Lightweight: Efficient processing with minimal computational overhead

Basic Setup

from videosdk.plugins.rnnoise import RNNoise  
  
# Initialize noise removal 
denoise = RNNoise()

Pipeline Integration

Cascading Pipeline
Real-time Pipeline

Add noise removal to your cascading pipeline:

main.py
from videosdk.agents import Agent, CascadingPipeline, AgentSession  
from videosdk.plugins.rnnoise import RNNoise  
# Add your preferred providers  
from videosdk.plugins.deepgram import DeepgramSTT  
from videosdk.plugins.openai import OpenAILLM  
from videosdk.plugins.elevenlabs import ElevenLabsTTS  
from videosdk.plugins.silero import SileroVAD  
  
class EnhancedVoiceAgent(Agent):  
    def __init__(self):  
        super().__init__(  
            instructions="You are a professional assistant with crystal-clear audio quality. Help users with their questions while maintaining excellent conversation flow."  
        )  
  
    async def on_enter(self):  
        await self.session.say("Hello! I'm here with enhanced audio quality for our conversation.")  
  
    async def on_exit(self):  
        await self.session.say("Goodbye! It was great talking with you.")  
  
# Set up pipeline with noise removal  
pipeline = CascadingPipeline(  
    stt=DeepgramSTT(api_key="your-deepgram-key"),  
    llm=OpenAILLM(api_key="your-openai-key", model="gpt-4"),  
    tts=ElevenLabsTTS(api_key="your-elevenlabs-key", voice_id="your-voice-id"),  
    vad=SileroVAD(),  
    denoise=RNNoise()  # Enable noise removal  
)  
  
# Create and start session  
async def main():  
    session = AgentSession(agent=EnhancedVoiceAgent(), pipeline=pipeline)  
    await session.start()  
  
if __name__ == "__main__":  
    import asyncio  
    asyncio.run(main())

Integrate with real-time models:

main.py
from videosdk.agents import Agent, RealTimePipeline, AgentSession
from videosdk.plugins.rnnoise import RNNoise  
from videosdk.plugins.openai import OpenAIRealtime  
  
class EnhancedRealtimeAgent(Agent):  
    def __init__(self):  
        super().__init__(  
            instructions="You are a professional assistant with crystal-clear audio quality. Engage in natural, real-time conversations while providing helpful responses."  
        )  
  
    async def on_enter(self):  
        await self.session.say("Hello! I'm ready for a real-time conversation with enhanced audio quality.")  
  
    async def on_exit(self):  
        await self.session.say("Thank you for the conversation! Take care.")  
  
# Set up real-time model  
model = OpenAIRealtime(  
    model="gpt-4o-realtime-preview",  
    api_key="your-openai-key",  
    voice="alloy"  # Choose from: alloy, echo, fable, onyx, nova, shimmer  
)  
  
# Set up pipeline with noise removal  
pipeline = RealTimePipeline(  
    model=model,  
    denoise=RNNoise()  # Enable noise removal  
)  
  
# Create and start session  
async def main():  
    session = AgentSession(agent=EnhancedRealtimeAgent(), pipeline=pipeline)  
    await session.start()  
  
if __name__ == "__main__":  
    import asyncio  
    asyncio.run(main())

Audio Processing Flow

When noise removal is enabled, audio processing follows this flow:

Raw Audio Input: Microphone captures audio with background noise
Noise Removal: RNNoise filters out unwanted sounds
Enhanced Audio: Clean audio is passed to speech processing
Improved Results: Better transcription and conversation quality

Example - Try Out Yourself

Enhanced Pronounciation Example

Checkout example with enhanced voice and noise removal

Got a Question? Ask us on discord

Overview​

What De-noise Solves​

RNNoise Implementation​

Key Features​

Basic Setup​

Pipeline Integration​

Audio Processing Flow​

Example - Try Out Yourself​