Skip to main content

Utterence Handle

UtteranceHandle is a lifecycle management class for agent utterances in the videosdk-agents framework. It solves two critical problems:

  • preventing overlapping text-to-speech (TTS) output
  • enabling graceful interruption handling when users speak during agent responses. This is essential for creating natural conversational experiences where agents can generate multiple sequential speech outputs without audio overlap.

Core Concepts

Lifecycle Management

Each UtteranceHandle instance tracks a single utterance from creation through completion. The handle manages state transitions automatically as the conversation progresses.

Completion States

An utterance can complete in two ways:

  1. Natural Completion: The TTS finishes playing the audio to completion
  2. User Interruption: The user starts speaking, triggering an interruption

Awaitable Pattern

The handle is compatible with Python's async/await syntax. This allows you to write sequential speech code that waits for each utterance to complete before starting the next one.

API Reference

Properties

Property/MethodReturn TypeDescription
idstrUnique identifier for the utterance
done()boolReturns True if utterance is complete
interruptedboolReturns True if user interrupted
interrupt()NoneManually marks utterance as interrupted
await()GeneratorEnables awaiting the handle

Methods

  • interrupt(): Manually marks the utterance as interrupted
  • __await__(): Enables awaiting the handle to wait for completion

Usage Patterns

Sequential Speech

To prevent overlapping TTS, await each handle before starting the next utterance:

# Correct approach  
handle1 = self.session.say(f"The current temperature is {temperature}°C.")
await handle1 # Wait for first utterance to complete

handle2 = self.session.say("Do you live in this city?")
await handle2 # Wait for second utterance to complete

Checking Interruption Status

Access the current utterance handle via self.session.current_utterance in function tools to detect interruptions:

utterance: UtteranceHandle | None = self.session.current_utterance  

# In long-running operations, check periodically
for i in range(10):
if utterance and utterance.interrupted:
logger.info("Task was interrupted by the user.")
return "The task was cancelled because you interrupted me."

await asyncio.sleep(1)

Anti-Pattern: Concurrent Speech

Never use asyncio.create_task() for speech that should be sequential, as this causes overlapping audio:

# INCORRECT - causes overlapping speech  
asyncio.create_task(self.session.say(f"The current temperature is {temperature}°C."))
asyncio.create_task(self.session.say("Do you live in this city?"))

Integration with AgentSession

The session.say() method returns an UtteranceHandle instance. During function tool execution, the current utterance is accessible via self.session.current_utterance. The handle's lifecycle is managed automatically by the session, with completion and interruption states updated as the conversation progresses.

Complete Example

@function_tool  
async def get_weather(self, latitude: str, longitude: str) -> dict:
utterance: UtteranceHandle | None = self.session.current_utterance

# Fetch weather data
temperature = await fetch_temperature(latitude, longitude)

# Sequential speech with await
handle1 = self.session.say(f"The current temperature is {temperature}°C.")
await handle1

handle2 = self.session.say("Do you live in this city?")
await handle2

# Check if user interrupted
if utterance and utterance.interrupted:
return {"response": "Weather request cancelled due to user interruption."}

return {"response": f"The temperature is {temperature}°C."}

Best Practices

  1. Always await handles when you need sequential speech to prevent audio overlap
  2. Check interrupted status in long-running operations to enable graceful cancellation
  3. Store handle references if you need to check status later in your function
  4. Avoid create_task() for speech that should play sequentially

Common Use Cases

  • Multi-part responses: When function tools need to speak multiple sentences in sequence
  • Long-running operations: Tasks that should be cancellable when users interrupt
  • Conversational flows: Scenarios requiring precise timing between utterances

Example - Try It Yourself

FAQs

Troubleshooting
IssueSolution
Overlapping speechUse await on handles instead of create_task()
Tasks not cancelling on interruptionCheck utterance.interrupted in loops
Handle is NoneOnly available during function tool execution via session.current_utterance
Correct Usage Pattern

✅ Correct: Sequential Speech

Await each handle to prevent overlapping TTS.

handle1 = session.say("First")
await handle1
handle2 = session.say("Second")
await handle2

❌ Incorrect: Concurrent Speech

Using create_task() causes audio overlap.

asyncio.create_task(session.say("First"))
asyncio.create_task(session.say("Second"))

Got a Question? Ask us on discord