De-noise
De-noise improves audio quality in your AI agent conversations by filtering out background noise. This creates more professional and engaging interactions, especially in noisy environments.
Overview
The VideoSDK Agents framework provides real-time audio denoising capabilities via RNNoise
plugin that:
- Remove Background Noise: Filters out ambient sounds, keyboard typing, air conditioning, and other distractions
- Enhance Voice Clarity: Improves speech intelligibility and quality
- Work in Real-time: Processes audio with minimal latency during live conversations
- Integrate Seamlessly: Works with both
CascadingPipeline
andRealTimePipeline
architectures
What De-noise Solves
Without noise removal, your agents may struggle with:
- Poor audio quality affecting transcription accuracy
- Background noise interfering with conversation flow
- Unprofessional sound quality in business applications
- Difficulty understanding users in noisy environments
With De-noise, you get:
- Crystal clear audio for better user experience
- Improved speech-to-text accuracy
- Professional-grade audio quality
- Better performance in various acoustic environments
RNNoise Implementation
RNNoise
is a real-time noise suppression library that uses deep learning to distinguish between speech and noise, providing effective background noise removal.
Key Features
- Real-time Processing: Low-latency noise removal suitable for live conversations
- Adaptive Filtering: Automatically adjusts to different types of background noise
- Speech Preservation: Maintains voice quality while removing unwanted sounds
- Lightweight: Efficient processing with minimal computational overhead
Basic Setup
from videosdk.plugins.rnnoise import RNNoise
# Initialize noise removal
denoise = RNNoise()
Pipeline Integration
- Cascading Pipeline
- Real-time Pipeline
Add noise removal to your cascading pipeline:
main.py
from videosdk.agents import Agent, CascadingPipeline, AgentSession
from videosdk.plugins.rnnoise import RNNoise
# Add your preferred providers
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
class EnhancedVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a professional assistant with crystal-clear audio quality. Help users with their questions while maintaining excellent conversation flow."
)
async def on_enter(self):
await self.session.say("Hello! I'm here with enhanced audio quality for our conversation.")
async def on_exit(self):
await self.session.say("Goodbye! It was great talking with you.")
# Set up pipeline with noise removal
pipeline = CascadingPipeline(
stt=DeepgramSTT(api_key="your-deepgram-key"),
llm=OpenAILLM(api_key="your-openai-key", model="gpt-4"),
tts=ElevenLabsTTS(api_key="your-elevenlabs-key", voice_id="your-voice-id"),
vad=SileroVAD(),
denoise=RNNoise() # Enable noise removal
)
# Create and start session
async def main():
session = AgentSession(agent=EnhancedVoiceAgent(), pipeline=pipeline)
await session.start()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Integrate with real-time models:
main.py
from videosdk.agents import Agent, RealTimePipeline, AgentSession
from videosdk.plugins.rnnoise import RNNoise
from videosdk.plugins.openai import OpenAIRealtime
class EnhancedRealtimeAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a professional assistant with crystal-clear audio quality. Engage in natural, real-time conversations while providing helpful responses."
)
async def on_enter(self):
await self.session.say("Hello! I'm ready for a real-time conversation with enhanced audio quality.")
async def on_exit(self):
await self.session.say("Thank you for the conversation! Take care.")
# Set up real-time model
model = OpenAIRealtime(
model="gpt-4o-realtime-preview",
api_key="your-openai-key",
voice="alloy" # Choose from: alloy, echo, fable, onyx, nova, shimmer
)
# Set up pipeline with noise removal
pipeline = RealTimePipeline(
model=model,
denoise=RNNoise() # Enable noise removal
)
# Create and start session
async def main():
session = AgentSession(agent=EnhancedRealtimeAgent(), pipeline=pipeline)
await session.start()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Audio Processing Flow
When noise removal is enabled, audio processing follows this flow:
- Raw Audio Input: Microphone captures audio with background noise
- Noise Removal:
RNNoise
filters out unwanted sounds - Enhanced Audio: Clean audio is passed to speech processing
- Improved Results: Better transcription and conversation quality
Example - Try Out Yourself
Got a Question? Ask us on discord