Skip to main content

De-noise

De-noise improves audio quality in your AI agent conversations by filtering out background noise. This creates more professional and engaging interactions, especially in noisy environments.

Overview

The VideoSDK Agents framework provides real-time audio denoising capabilities via RNNoise plugin that:

  • Remove Background Noise: Filters out ambient sounds, keyboard typing, air conditioning, and other distractions
  • Enhance Voice Clarity: Improves speech intelligibility and quality
  • Work in Real-time: Processes audio with minimal latency during live conversations
  • Integrate Seamlessly: Works with both CascadingPipeline and RealTimePipeline architectures

What De-noise Solves

Without noise removal, your agents may struggle with:

  • Poor audio quality affecting transcription accuracy
  • Background noise interfering with conversation flow
  • Unprofessional sound quality in business applications
  • Difficulty understanding users in noisy environments

With De-noise, you get:

  • Crystal clear audio for better user experience
  • Improved speech-to-text accuracy
  • Professional-grade audio quality
  • Better performance in various acoustic environments

RNNoise Implementation

RNNoise is a real-time noise suppression library that uses deep learning to distinguish between speech and noise, providing effective background noise removal.

Key Features

  • Real-time Processing: Low-latency noise removal suitable for live conversations
  • Adaptive Filtering: Automatically adjusts to different types of background noise
  • Speech Preservation: Maintains voice quality while removing unwanted sounds
  • Lightweight: Efficient processing with minimal computational overhead

Basic Setup

from videosdk.plugins.rnnoise import RNNoise  

# Initialize noise removal
denoise = RNNoise()

Pipeline Integration

Add noise removal to your cascading pipeline:

main.py
from videosdk.agents import Agent, CascadingPipeline, AgentSession  
from videosdk.plugins.rnnoise import RNNoise
# Add your preferred providers
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD

class EnhancedVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a professional assistant with crystal-clear audio quality. Help users with their questions while maintaining excellent conversation flow."
)

async def on_enter(self):
await self.session.say("Hello! I'm here with enhanced audio quality for our conversation.")

async def on_exit(self):
await self.session.say("Goodbye! It was great talking with you.")

# Set up pipeline with noise removal
pipeline = CascadingPipeline(
stt=DeepgramSTT(api_key="your-deepgram-key"),
llm=OpenAILLM(api_key="your-openai-key", model="gpt-4"),
tts=ElevenLabsTTS(api_key="your-elevenlabs-key", voice_id="your-voice-id"),
vad=SileroVAD(),
denoise=RNNoise() # Enable noise removal
)

# Create and start session
async def main():
session = AgentSession(agent=EnhancedVoiceAgent(), pipeline=pipeline)
await session.start()

if __name__ == "__main__":
import asyncio
asyncio.run(main())

Audio Processing Flow

When noise removal is enabled, audio processing follows this flow:

  1. Raw Audio Input: Microphone captures audio with background noise
  2. Noise Removal: RNNoise filters out unwanted sounds
  3. Enhanced Audio: Clean audio is passed to speech processing
  4. Improved Results: Better transcription and conversation quality

Example - Try Out Yourself

Got a Question? Ask us on discord