Utterence Handle
UtteranceHandle is a lifecycle management class for agent utterances in the videosdk-agents framework. It solves two critical problems:
- preventing overlapping text-to-speech (TTS) output
- enabling graceful interruption handling when users speak during agent responses. This is essential for creating natural conversational experiences where agents can generate multiple sequential speech outputs without audio overlap.
Core Concepts
Lifecycle Management
Each UtteranceHandle instance tracks a single utterance from creation through completion. The handle manages state transitions automatically as the conversation progresses.
Completion States
An utterance can complete in two ways:
- Natural Completion: The TTS finishes playing the audio to completion
- User Interruption: The user starts speaking, triggering an interruption
Awaitable Pattern
The handle is compatible with Python's async/await syntax. This allows you to write sequential speech code that waits for each utterance to complete before starting the next one.
API Reference
Properties
| Property/Method | Return Type | Description |
|---|---|---|
| id | str | Unique identifier for the utterance |
| done() | bool | Returns True if utterance is complete |
| interrupted | bool | Returns True if user interrupted |
| interrupt() | None | Manually marks utterance as interrupted |
| await() | Generator | Enables awaiting the handle |
Methods
interrupt(): Manually marks the utterance as interrupted__await__(): Enables awaiting the handle to wait for completion
Usage Patterns
Sequential Speech
To prevent overlapping TTS, await each handle before starting the next utterance:
# Correct approach
handle1 = self.session.say(f"The current temperature is {temperature}°C.")
await handle1 # Wait for first utterance to complete
handle2 = self.session.say("Do you live in this city?")
await handle2 # Wait for second utterance to complete
Checking Interruption Status
Access the current utterance handle via self.session.current_utterance in function tools to detect interruptions:
utterance: UtteranceHandle | None = self.session.current_utterance
# In long-running operations, check periodically
for i in range(10):
if utterance and utterance.interrupted:
logger.info("Task was interrupted by the user.")
return "The task was cancelled because you interrupted me."
await asyncio.sleep(1)
Anti-Pattern: Concurrent Speech
Never use asyncio.create_task() for speech that should be sequential, as this causes overlapping audio:
# INCORRECT - causes overlapping speech
asyncio.create_task(self.session.say(f"The current temperature is {temperature}°C."))
asyncio.create_task(self.session.say("Do you live in this city?"))
Integration with AgentSession
The session.say() method returns an UtteranceHandle instance. During function tool execution, the current utterance is accessible via self.session.current_utterance. The handle's lifecycle is managed automatically by the session, with completion and interruption states updated as the conversation progresses.
Complete Example
@function_tool
async def get_weather(self, latitude: str, longitude: str) -> dict:
utterance: UtteranceHandle | None = self.session.current_utterance
# Fetch weather data
temperature = await fetch_temperature(latitude, longitude)
# Sequential speech with await
handle1 = self.session.say(f"The current temperature is {temperature}°C.")
await handle1
handle2 = self.session.say("Do you live in this city?")
await handle2
# Check if user interrupted
if utterance and utterance.interrupted:
return {"response": "Weather request cancelled due to user interruption."}
return {"response": f"The temperature is {temperature}°C."}
Best Practices
- Always await handles when you need sequential speech to prevent audio overlap
- Check
interruptedstatus in long-running operations to enable graceful cancellation - Store handle references if you need to check status later in your function
- Avoid
create_task()for speech that should play sequentially
Common Use Cases
- Multi-part responses: When function tools need to speak multiple sentences in sequence
- Long-running operations: Tasks that should be cancellable when users interrupt
- Conversational flows: Scenarios requiring precise timing between utterances
Example - Try It Yourself
FAQs
Troubleshooting
| Issue | Solution |
|---|---|
| Overlapping speech | Use await on handles instead of create_task() |
| Tasks not cancelling on interruption | Check utterance.interrupted in loops |
| Handle is None | Only available during function tool execution via session.current_utterance |
Correct Usage Pattern
✅ Correct: Sequential Speech
Await each handle to prevent overlapping TTS.
handle1 = session.say("First")
await handle1
handle2 = session.say("Second")
await handle2
❌ Incorrect: Concurrent Speech
Using create_task() causes audio overlap.
asyncio.create_task(session.say("First"))
asyncio.create_task(session.say("Second"))
Got a Question? Ask us on discord

