Skip to main content

Background Audio

The Background Audio feature enables voice agents to play audio during conversations, enhancing user experience with ambient sounds and processing feedback. There are two ways to set the audio:

  1. Thinking Audio: Plays automatically during agent processing (e.g., keyboard typing sounds)
  2. Background Audio: Plays on-demand for ambient music or soundscapes

Thinking Audio

Getting Started

Enable Background Audio

from videosdk.agents import RoomOptions, JobContext  

room_options = RoomOptions(
room_id="your-room-id",
name="My Agent",
background_audio=True # Enable background audio support
)

context = JobContext(room_options=room_options)

Agent Methods

1. Set Thinking Audio

set_thinking_audio(): Configures audio that plays automatically while the agent processes responses.

Parameters:

  • file (str, optional): Path to custom WAV audio file. If not provided, uses built-in agent_keyboard.wav
  • volume (float, optional): Volume of the audio. Default: 0.3

Example:

class MyAgent(Agent):  
def __init__(self):
super().__init__(instructions="...")
# Use default keyboard sound
self.set_thinking_audio()
# Or use custom audio
# self.set_thinking_audio(file="path/to/custom.wav")

2. Play Background Audio

play_background_audio(): Starts playing background audio during the conversation.

Parameters:

  • file (str, optional): Path to custom WAV audio file. If not provided, uses built-in classical.wav
  • looping (bool, optional): Whether to loop the audio. Default: False
  • override_thinking (bool, optional): Whether to stop thinking audio when background audio starts. Default: True
  • volume (float, optional): Volume of the audio. Default: 1.0

Example:

@function_tool  
async def play_music(self):
"""Plays background music"""
await self.play_background_audio(
looping=True,
override_thinking=False
)
return "Music started"

3. Stop Background Audio

stop_background_audio(): Stops currently playing background audio.

Example:

@function_tool  
async def stop_music(self):
"""Stops background music"""
await self.stop_background_audio()
return "Music stopped"

Complete Example

main.py
from videosdk.agents import (  
Agent, AgentSession, CascadingPipeline,
WorkerJob, ConversationFlow, JobContext,
RoomOptions, function_tool
)
from videosdk.plugins.openai import OpenAILLM, OpenAITTS
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector

class MusicAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant. Use control_music to play or stop background music."
)
# Enable thinking audio with default keyboard sound
self.set_thinking_audio()

async def on_enter(self):
await self.session.say("Hello! Ask me to play music.")

@function_tool
async def control_music(self, action: str):
"""
Controls background music.
:param action: 'play' to start music, 'stop' to end it
"""
if action == "play":
await self.play_background_audio(
override_thinking=True,
looping=True
)
return "Music started"
elif action == "stop":
await self.stop_background_audio()
return "Music stopped"
return "Invalid action"

async def entrypoint(ctx: JobContext):
agent = MusicAgent()

pipeline = CascadingPipeline(
stt=DeepgramSTT(),
llm=OpenAILLM(),
tts=OpenAITTS(),
vad=SileroVAD(),
turn_detector=TurnDetector()
)

session = AgentSession(
agent=agent,
pipeline=pipeline,
conversation_flow=ConversationFlow(agent)
)

await ctx.run_until_shutdown(session=session)

def make_context():
return JobContext(
room_options=RoomOptions(
room_id="<room_id>",
name="Music Agent",
background_audio=True # Required!
)
)

if __name__ == "__main__":
job = WorkerJob(entrypoint=entrypoint, jobctx=make_context)
job.start()

Pipeline Support

Background audio works with both pipeline types:

Cascading Pipeline

  • Thinking audio plays automatically during LLM processing
  • Background audio can be controlled via agent methods
  • Audio stops automatically when agent speaks

RealTime Pipeline

  • Full background audio support with streaming models
  • Automatic lifecycle management during conversation turns

Audio Behavior

FeatureThinking AudioBackground Audio
TriggerAutomatic during processingManual via play_background_audio()
Default Fileagent_keyboard.wavclassical.wav
Typical DurationShort (during LLM call)Long/continuous
LoopingOptionalRecommended (looping=True)
User ControlNoYes (via function tools)
Stops WhenAgent speaksAgent speaks or stop_background_audio()

Audio File Requirements

  • Format: WAV (.wav)
  • Recommended: 16-bit PCM, 16kHz sample rate, mono channel
  • Built-in files:
    • agent_keyboard.wav: Default thinking sound
    • classical.wav: Default background music

Best Practices

  1. Always enable in RoomOptions: Set background_audio=True before using audio methods
  2. Use override_thinking=True: When playing music to avoid overlapping sounds
  3. Loop background audio: Set looping=True for continuous ambient sounds
  4. Control via function tools: Let users control music through natural language
  5. Clean audio files: Use high-quality WAV files to avoid distortion

Common Use Cases

  • Music player agent: Control playback through conversation
  • Ambient soundscapes: Create atmosphere during interactions
  • Processing feedback: Custom thinking sounds for different agent personalities
  • Hold music: Play audio while agent performs long operations

Example - Try It Yourself

FAQs

Troubleshooting
IssueSolution
Audio not playingVerify background_audio=True in RoomOptions
Audio quality issuesUse WAV format with 16-bit PCM encoding
Audio doesn't stopEnsure stop_background_audio() is called properly
Overlapping soundsUse override_thinking=True when playing background audio

Got a Question? Ask us on discord