Context Management
Every Agent carries a ChatContext — the structured conversation memory that records spoken turns, tool calls, tool results, agent-to-agent handoffs, and configuration changes. The same ChatContext is the single source of truth for cascade pipelines, realtime models, and multi-agent flows, which means a call can swap pipelines or hand off between agents mid-conversation without losing memory.
This page covers the conversation-shape primitives: what ChatContext records, how to swap pipelines mid-call, and how to fork/merge contexts for sub-agents.
For automatic summarization and token-budget truncation on long calls, see Context Window.
What ChatContext Records
Unlike a plain message list, ChatContext is a timestamp-ordered log of typed items so the full history — not just the spoken turns — survives a pipeline switch or an agent handoff.
| Item Type | When It's Recorded |
|---|---|
ChatMessage | User and assistant turns (final transcripts), system instructions |
FunctionCall | Every tool the agent invoked, with its arguments |
FunctionCallOutput | The return value of each tool call (or its error) |
AgentHandoff | Transfer markers when one agent hands the conversation to another |
AgentConfigUpdate | Mid-call instruction or tool changes |
# Inspect the agent's context at any time
items = self.chat_context.items
recent_user_message = self.chat_context.messages()[-1]
Mid-Call Pipeline Switching
A single AgentSession can switch its pipeline mid-call — from cascade (STT → LLM → TTS) to a realtime speech-to-speech model, or vice versa — via pipeline.change_pipeline(...). The agent's chat_context is not touched by the switch, and the realtime model seeds itself from that context on connect.
For a full mid-call switch (e.g. cascade → realtime), use change_pipeline(...). See Reconfigure Entire Pipeline.
Idempotency
The switch tool stays available on the agent after the switch happens. Without a guard, a realtime model — seeded with a conversation that's all about switching — can loop on the same tool. The self._switched flag above makes the tool a safe no-op on repeat calls.
Supported realtime providers
change_pipeline(...) works with every realtime provider that records back into ChatContext:
| Provider | Model class |
|---|---|
| Google Gemini Live | GeminiRealtime |
| OpenAI Realtime (GA) | OpenAIRealtime |
| xAI | XAIRealtime |
| Ultravox | UltravoxRealtime |
Each provider seeds its prior conversation into the realtime session's instructions on connect, so the realtime half starts already aware of what was said and which tools were called.
Multi-Agent Context Patterns
When two or more agents share a conversation, ChatContext provides primitives for transferring control, isolating sub-agent work, and merging results back.
Hand off to a peer agent
add_handoff(...) records a transfer marker on the shared context so the receiving agent's first turn is informed by what the previous agent did and why.
# In the intake agent, before transferring to billing
self.chat_context.add_handoff(
to_agent="billing",
from_agent="intake",
reason="Caller wants to dispute a charge on order 456.",
)
The billing agent reads the handoff marker on takeover and can greet the caller with full context: "Hi, I see you'd like to dispute the charge on order 456 — let me pull that up."
Spin off a sub-agent (fork / merge)
For a focused side-task, fork a scoped context for a sub-agent, let it work, then merge its result back. The supervisor never loses its main context, and the sub-agent only sees what you choose to expose.
# Supervisor delegating refund-eligibility check
sub_ctx = self.chat_context.fork_brief(
instructions="You check refund eligibility for one order.",
recent_turns=2,
)
# ... sub-agent runs against sub_ctx ...
# Pull only the sub-agent's final answer back
await self.chat_context.merge_result(sub_ctx, agent_id="refund-checker")
| Method | What it returns |
|---|---|
fork() | A copy of the full context |
fork_filtered(...) | A copy with only the items matching your filter |
fork_brief(instructions=..., recent_turns=N) | Fresh context with custom instructions + the last N turns |
| Merge variant | What it merges back |
|---|---|
merge(other) | Every item from the sub-agent's context (full audit trail) |
merge_result(other, agent_id=...) | Only the sub-agent's final assistant message (cheapest) |
merge_with_summary(other, llm=..., agent_id=...) | An LLM-generated summary of the sub-agent's work |
Read-only view
ReadOnlyChatContext wraps a context so a sub-agent can read history but cannot mutate it — handy when you want strong isolation between agents.
from videosdk.agents import ReadOnlyChatContext
readonly = ReadOnlyChatContext(self.chat_context)
# Pass readonly to a sub-agent; any add_message / merge / truncate raises.
Realtime Tool-Call Recording
When a realtime model invokes a tool, the SDK records both the FunctionCall and the FunctionCallOutput on the agent's chat_context — dedup'd by call_id. This is what makes a realtime → cascade switch work: the cascade LLM on the other side of the switch reads the realtime model's tool calls and results as if it had made them itself, so it doesn't re-call lookup_order(456) it already has the answer to.
No configuration is needed — this happens automatically for every realtime provider listed above.
Reference
Pipeline.change_pipeline(...)
Swap one or more components on a live pipeline. The agent's chat_context is preserved across the switch.
| Argument | Type | Purpose |
|---|---|---|
stt | STT | None | Replace the STT component (cascade mode) |
llm | LLM | RealtimeBaseModel | None | Replace the LLM, or swap into a realtime model |
tts | TTS | None | Replace the TTS component |
vad | VAD | None | Replace the VAD component |
turn_detector | TurnDetector | None | Replace the turn detector |
ChatContext core methods
| Method | Purpose |
|---|---|
add_message(role, content, ...) | Append a ChatMessage |
add_function_call(name, arguments, ...) | Append a FunctionCall |
add_function_output(name, output, call_id, ...) | Append a FunctionCallOutput |
add_handoff(to_agent, from_agent, reason) | Mark an agent-to-agent transfer |
messages() | Filter to ChatMessage items only |
turn_count() | Count user turns |
estimated_tokens() | Rough token estimate (~4 chars/token) |
fork() / fork_filtered(...) / fork_brief(...) | Create a scoped copy for a sub-agent |
merge(other) / merge_result(...) / merge_with_summary(...) | Merge a sub-agent's work back, in-place |
truncate(max_items=..., max_tokens=...) | Manually trim while preserving system + summary items |
Examples — Try Out Yourself
Cascade → Realtime Handoff
Start on cascade with tool calls, switch to a realtime model mid-call; chat_context carries across.
Realtime → Cascade Handoff
Start on a realtime model; switch to cascade. Realtime tool calls survive the switch.
Sequential Agent Handoff
Two peer agents share one ChatContext; intake records add_handoff and billing reads it on takeover.
Supervisor / Sub-Agent
Supervisor forks a scoped context for a refund-check sub-agent and merges the result back.
Got a Question? Ask us on discord

