Utterence Handle

UtteranceHandle is a lifecycle management class for agent utterances in the videosdk-agents framework. It solves two critical problems:

preventing overlapping text-to-speech (TTS) output
enabling graceful interruption handling when users speak during agent responses. This is essential for creating natural conversational experiences where agents can generate multiple sequential speech outputs without audio overlap.

Core Concepts

Lifecycle Management

Each UtteranceHandle instance tracks a single utterance from creation through completion. The handle manages state transitions automatically as the conversation progresses.

Completion States

An utterance can complete in two ways:

Natural Completion: The TTS finishes playing the audio to completion
User Interruption: The user starts speaking, triggering an interruption

Awaitable Pattern

The handle is compatible with Python's async/await syntax. This allows you to write sequential speech code that waits for each utterance to complete before starting the next one.

API Reference

Properties

Property/Method	Return Type	Description
id	str	Unique identifier for the utterance
done()	bool	Returns True if utterance is complete
interrupted	bool	Returns True if user interrupted
interrupt()	None	Manually marks utterance as interrupted
await()	Generator	Enables awaiting the handle

Methods

interrupt(): Manually marks the utterance as interrupted
__await__(): Enables awaiting the handle to wait for completion

Usage Patterns

Sequential Speech

To prevent overlapping TTS, await each handle before starting the next utterance:

# Correct approach  
handle1 = self.session.say(f"The current temperature is {temperature}°C.")  
await handle1  # Wait for first utterance to complete  
  
handle2 = self.session.say("Do you live in this city?")  
await handle2  # Wait for second utterance to complete

Checking Interruption Status

Access the current utterance handle via self.session.current_utterance in function tools to detect interruptions:

utterance: UtteranceHandle | None = self.session.current_utterance  
  
# In long-running operations, check periodically  
for i in range(10):  
    if utterance and utterance.interrupted:  
        logger.info("Task was interrupted by the user.")  
        return "The task was cancelled because you interrupted me."  
      
    await asyncio.sleep(1)

Anti-Pattern: Concurrent Speech

Never use asyncio.create_task() for speech that should be sequential, as this causes overlapping audio:

# INCORRECT - causes overlapping speech  
asyncio.create_task(self.session.say(f"The current temperature is {temperature}°C."))  
asyncio.create_task(self.session.say("Do you live in this city?"))

Integration with AgentSession

The session.say() method returns an UtteranceHandle instance. During function tool execution, the current utterance is accessible via self.session.current_utterance. The handle's lifecycle is managed automatically by the session, with completion and interruption states updated as the conversation progresses.

Complete Example

@function_tool  
async def get_weather(self, latitude: str, longitude: str) -> dict:  
    utterance: UtteranceHandle | None = self.session.current_utterance  
      
    # Fetch weather data  
    temperature = await fetch_temperature(latitude, longitude)  
      
    # Sequential speech with await  
    handle1 = self.session.say(f"The current temperature is {temperature}°C.")  
    await handle1  
      
    handle2 = self.session.say("Do you live in this city?")  
    await handle2  
      
    # Check if user interrupted  
    if utterance and utterance.interrupted:  
        return {"response": "Weather request cancelled due to user interruption."}  
      
    return {"response": f"The temperature is {temperature}°C."}

Best Practices

Always await handles when you need sequential speech to prevent audio overlap
Check interrupted status in long-running operations to enable graceful cancellation
Store handle references if you need to check status later in your function
Avoid create_task() for speech that should play sequentially

Common Use Cases

Multi-part responses: When function tools need to speak multiple sentences in sequence
Long-running operations: Tasks that should be cancellable when users interrupt
Conversational flows: Scenarios requiring precise timing between utterances

Example - Try It Yourself

Utterence handle example

Checkout the interruption handle implementation via the utterence handle functionality

FAQs

Troubleshooting

Issue	Solution
Overlapping speech	Use `await` on handles instead of `create_task()`
Tasks not cancelling on interruption	Check `utterance.interrupted` in loops
Handle is None	Only available during function tool execution via `session.current_utterance`

Correct Usage Pattern

✅ Correct: Sequential Speech

Await each handle to prevent overlapping TTS.

handle1 = session.say("First")
await handle1
handle2 = session.say("Second")
await handle2

❌ Incorrect: Concurrent Speech

Using create_task() causes audio overlap.

asyncio.create_task(session.say("First"))
asyncio.create_task(session.say("Second"))

Got a Question? Ask us on discord

Core Concepts​

Lifecycle Management​

Completion States​

Awaitable Pattern​

API Reference​

Properties​

Methods​

Usage Patterns​

Sequential Speech​

Checking Interruption Status​

Anti-Pattern: Concurrent Speech​

Integration with AgentSession​

Complete Example​

Best Practices​

Common Use Cases​

Example - Try It Yourself​