Skip to main content
Version: 1.0.x

Context Management

Every Agent carries a ChatContext — the structured conversation memory that records spoken turns, tool calls, tool results, agent-to-agent handoffs, and configuration changes. The same ChatContext is the single source of truth for cascade pipelines, realtime models, and multi-agent flows, which means a call can swap pipelines or hand off between agents mid-conversation without losing memory.

This page covers the conversation-shape primitives: what ChatContext records, how to swap pipelines mid-call, and how to fork/merge contexts for sub-agents.

tip

For automatic summarization and token-budget truncation on long calls, see Context Window.

What ChatContext Records

Unlike a plain message list, ChatContext is a timestamp-ordered log of typed items so the full history — not just the spoken turns — survives a pipeline switch or an agent handoff.

Item TypeWhen It's Recorded
ChatMessageUser and assistant turns (final transcripts), system instructions
FunctionCallEvery tool the agent invoked, with its arguments
FunctionCallOutputThe return value of each tool call (or its error)
AgentHandoffTransfer markers when one agent hands the conversation to another
AgentConfigUpdateMid-call instruction or tool changes
main.py
# Inspect the agent's context at any time
items = self.chat_context.items
recent_user_message = self.chat_context.messages()[-1]

Mid-Call Pipeline Switching

A single AgentSession can switch its pipeline mid-call — from cascade (STT → LLM → TTS) to a realtime speech-to-speech model, or vice versa — via pipeline.change_pipeline(...). The agent's chat_context is not touched by the switch, and the realtime model seeds itself from that context on connect.

tip

For a full mid-call switch (e.g. cascade → realtime), use change_pipeline(...). See Reconfigure Entire Pipeline.

Idempotency

The switch tool stays available on the agent after the switch happens. Without a guard, a realtime model — seeded with a conversation that's all about switching — can loop on the same tool. The self._switched flag above makes the tool a safe no-op on repeat calls.

Supported realtime providers

change_pipeline(...) works with every realtime provider that records back into ChatContext:

ProviderModel class
Google Gemini LiveGeminiRealtime
OpenAI Realtime (GA)OpenAIRealtime
xAIXAIRealtime
UltravoxUltravoxRealtime

Each provider seeds its prior conversation into the realtime session's instructions on connect, so the realtime half starts already aware of what was said and which tools were called.


Multi-Agent Context Patterns

When two or more agents share a conversation, ChatContext provides primitives for transferring control, isolating sub-agent work, and merging results back.

Hand off to a peer agent

add_handoff(...) records a transfer marker on the shared context so the receiving agent's first turn is informed by what the previous agent did and why.

main.py
# In the intake agent, before transferring to billing
self.chat_context.add_handoff(
to_agent="billing",
from_agent="intake",
reason="Caller wants to dispute a charge on order 456.",
)

The billing agent reads the handoff marker on takeover and can greet the caller with full context: "Hi, I see you'd like to dispute the charge on order 456 — let me pull that up."

Spin off a sub-agent (fork / merge)

For a focused side-task, fork a scoped context for a sub-agent, let it work, then merge its result back. The supervisor never loses its main context, and the sub-agent only sees what you choose to expose.

main.py
# Supervisor delegating refund-eligibility check
sub_ctx = self.chat_context.fork_brief(
instructions="You check refund eligibility for one order.",
recent_turns=2,
)

# ... sub-agent runs against sub_ctx ...

# Pull only the sub-agent's final answer back
await self.chat_context.merge_result(sub_ctx, agent_id="refund-checker")
MethodWhat it returns
fork()A copy of the full context
fork_filtered(...)A copy with only the items matching your filter
fork_brief(instructions=..., recent_turns=N)Fresh context with custom instructions + the last N turns
Merge variantWhat it merges back
merge(other)Every item from the sub-agent's context (full audit trail)
merge_result(other, agent_id=...)Only the sub-agent's final assistant message (cheapest)
merge_with_summary(other, llm=..., agent_id=...)An LLM-generated summary of the sub-agent's work

Read-only view

ReadOnlyChatContext wraps a context so a sub-agent can read history but cannot mutate it — handy when you want strong isolation between agents.

main.py
from videosdk.agents import ReadOnlyChatContext

readonly = ReadOnlyChatContext(self.chat_context)
# Pass readonly to a sub-agent; any add_message / merge / truncate raises.

Realtime Tool-Call Recording

When a realtime model invokes a tool, the SDK records both the FunctionCall and the FunctionCallOutput on the agent's chat_context — dedup'd by call_id. This is what makes a realtime → cascade switch work: the cascade LLM on the other side of the switch reads the realtime model's tool calls and results as if it had made them itself, so it doesn't re-call lookup_order(456) it already has the answer to.

No configuration is needed — this happens automatically for every realtime provider listed above.


Reference

Pipeline.change_pipeline(...)

Swap one or more components on a live pipeline. The agent's chat_context is preserved across the switch.

ArgumentTypePurpose
sttSTT | NoneReplace the STT component (cascade mode)
llmLLM | RealtimeBaseModel | NoneReplace the LLM, or swap into a realtime model
ttsTTS | NoneReplace the TTS component
vadVAD | NoneReplace the VAD component
turn_detectorTurnDetector | NoneReplace the turn detector

ChatContext core methods

MethodPurpose
add_message(role, content, ...)Append a ChatMessage
add_function_call(name, arguments, ...)Append a FunctionCall
add_function_output(name, output, call_id, ...)Append a FunctionCallOutput
add_handoff(to_agent, from_agent, reason)Mark an agent-to-agent transfer
messages()Filter to ChatMessage items only
turn_count()Count user turns
estimated_tokens()Rough token estimate (~4 chars/token)
fork() / fork_filtered(...) / fork_brief(...)Create a scoped copy for a sub-agent
merge(other) / merge_result(...) / merge_with_summary(...)Merge a sub-agent's work back, in-place
truncate(max_items=..., max_tokens=...)Manually trim while preserving system + summary items

Examples — Try Out Yourself

Got a Question? Ask us on discord