Background Audio
The Background Audio feature enables voice agents to play audio during conversations, enhancing user experience with ambient sounds and processing feedback. There are two ways to set the audio:
- Thinking Audio: Plays automatically during agent processing (e.g., keyboard typing sounds)
- Background Audio: Plays on-demand for ambient music or soundscapes
- Thinking Audio
- Background Audio


Getting Started
Enable Background Audio
from videosdk.agents import RoomOptions, JobContext
room_options = RoomOptions(
room_id="your-room-id",
name="My Agent",
background_audio=True # Enable background audio support
)
context = JobContext(room_options=room_options)
Agent Methods
1. Set Thinking Audio
set_thinking_audio(): Configures audio that plays automatically while the agent processes responses.
Parameters:
file (str, optional): Path to custom WAV audio file. If not provided, uses built-inagent_keyboard.wavvolume (float, optional): Volume of the audio. Default:0.3
Example:
class MyAgent(Agent):
def __init__(self):
super().__init__(instructions="...")
# Use default keyboard sound
self.set_thinking_audio()
# Or use custom audio
# self.set_thinking_audio(file="path/to/custom.wav")
2. Play Background Audio
play_background_audio(): Starts playing background audio during the conversation.
Parameters:
file (str, optional): Path to custom WAV audio file. If not provided, uses built-inclassical.wavlooping (bool, optional): Whether to loop the audio. Default:Falseoverride_thinking (bool, optional): Whether to stop thinking audio when background audio starts. Default:Truevolume (float, optional): Volume of the audio. Default:1.0
Example:
@function_tool
async def play_music(self):
"""Plays background music"""
await self.play_background_audio(
looping=True,
override_thinking=False
)
return "Music started"
3. Stop Background Audio
stop_background_audio(): Stops currently playing background audio.
Example:
@function_tool
async def stop_music(self):
"""Stops background music"""
await self.stop_background_audio()
return "Music stopped"
Complete Example
main.py
from videosdk.agents import (
Agent, AgentSession, CascadingPipeline,
WorkerJob, ConversationFlow, JobContext,
RoomOptions, function_tool
)
from videosdk.plugins.openai import OpenAILLM, OpenAITTS
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector
class MusicAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant. Use control_music to play or stop background music."
)
# Enable thinking audio with default keyboard sound
self.set_thinking_audio()
async def on_enter(self):
await self.session.say("Hello! Ask me to play music.")
@function_tool
async def control_music(self, action: str):
"""
Controls background music.
:param action: 'play' to start music, 'stop' to end it
"""
if action == "play":
await self.play_background_audio(
override_thinking=True,
looping=True
)
return "Music started"
elif action == "stop":
await self.stop_background_audio()
return "Music stopped"
return "Invalid action"
async def entrypoint(ctx: JobContext):
agent = MusicAgent()
pipeline = CascadingPipeline(
stt=DeepgramSTT(),
llm=OpenAILLM(),
tts=OpenAITTS(),
vad=SileroVAD(),
turn_detector=TurnDetector()
)
session = AgentSession(
agent=agent,
pipeline=pipeline,
conversation_flow=ConversationFlow(agent)
)
await ctx.run_until_shutdown(session=session)
def make_context():
return JobContext(
room_options=RoomOptions(
room_id="<room_id>",
name="Music Agent",
background_audio=True # Required!
)
)
if __name__ == "__main__":
job = WorkerJob(entrypoint=entrypoint, jobctx=make_context)
job.start()
Pipeline Support
Background audio works with both pipeline types:
Cascading Pipeline
- Thinking audio plays automatically during LLM processing
- Background audio can be controlled via agent methods
- Audio stops automatically when agent speaks
RealTime Pipeline
- Full background audio support with streaming models
- Automatic lifecycle management during conversation turns
Audio Behavior
| Feature | Thinking Audio | Background Audio |
|---|---|---|
| Trigger | Automatic during processing | Manual via play_background_audio() |
| Default File | agent_keyboard.wav | classical.wav |
| Typical Duration | Short (during LLM call) | Long/continuous |
| Looping | Optional | Recommended (looping=True) |
| User Control | No | Yes (via function tools) |
| Stops When | Agent speaks | Agent speaks or stop_background_audio() |
Audio File Requirements
- Format: WAV (
.wav) - Recommended: 16-bit PCM, 16kHz sample rate, mono channel
- Built-in files:
agent_keyboard.wav: Default thinking soundclassical.wav: Default background music
Best Practices
- Always enable in RoomOptions: Set
background_audio=Truebefore using audio methods - Use
override_thinking=True: When playing music to avoid overlapping sounds - Loop background audio: Set
looping=Truefor continuous ambient sounds - Control via function tools: Let users control music through natural language
- Clean audio files: Use high-quality WAV files to avoid distortion
Common Use Cases
- Music player agent: Control playback through conversation
- Ambient soundscapes: Create atmosphere during interactions
- Processing feedback: Custom thinking sounds for different agent personalities
- Hold music: Play audio while agent performs long operations
Example - Try It Yourself
FAQs
Troubleshooting
| Issue | Solution |
|---|---|
| Audio not playing | Verify background_audio=True in RoomOptions |
| Audio quality issues | Use WAV format with 16-bit PCM encoding |
| Audio doesn't stop | Ensure stop_background_audio() is called properly |
| Overlapping sounds | Use override_thinking=True when playing background audio |
Got a Question? Ask us on discord

