Version: 1.0.x

Context Management

Every Agent carries a ChatContext — the structured conversation memory that records spoken turns, tool calls, tool results, agent-to-agent handoffs, and configuration changes. The same ChatContext is the single source of truth for cascade pipelines, realtime models, and multi-agent flows, which means a call can swap pipelines or hand off between agents mid-conversation without losing memory.

This page covers the conversation-shape primitives: what ChatContext records, how to swap pipelines mid-call, and how to fork/merge contexts for sub-agents.

tip

For automatic summarization and token-budget truncation on long calls, see Context Window.

What `ChatContext` Records

Unlike a plain message list, ChatContext is a timestamp-ordered log of typed items so the full history — not just the spoken turns — survives a pipeline switch or an agent handoff.

Item Type	When It's Recorded
`ChatMessage`	User and assistant turns (final transcripts), system instructions
`FunctionCall`	Every tool the agent invoked, with its arguments
`FunctionCallOutput`	The return value of each tool call (or its error)
`AgentHandoff`	Transfer markers when one agent hands the conversation to another
`AgentConfigUpdate`	Mid-call instruction or tool changes

main.py
# Inspect the agent's context at any time
items = self.chat_context.items
recent_user_message = self.chat_context.messages()[-1]

Mid-Call Pipeline Switching

A single AgentSession can switch its pipeline mid-call — from cascade (STT → LLM → TTS) to a realtime speech-to-speech model, or vice versa — via pipeline.change_pipeline(...). The agent's chat_context is not touched by the switch, and the realtime model seeds itself from that context on connect.

tip

For a full mid-call switch (e.g. cascade → realtime), use change_pipeline(...). See Reconfigure Entire Pipeline.

Idempotency

The switch tool stays available on the agent after the switch happens. Without a guard, a realtime model — seeded with a conversation that's all about switching — can loop on the same tool. The self._switched flag above makes the tool a safe no-op on repeat calls.

Supported realtime providers

change_pipeline(...) works with every realtime provider that records back into ChatContext:

Provider	Model class
Google Gemini Live	`GeminiRealtime`
OpenAI Realtime (GA)	`OpenAIRealtime`
xAI	`XAIRealtime`
Ultravox	`UltravoxRealtime`

Each provider seeds its prior conversation into the realtime session's instructions on connect, so the realtime half starts already aware of what was said and which tools were called.

Multi-Agent Context Patterns

When two or more agents share a conversation, ChatContext provides primitives for transferring control, isolating sub-agent work, and merging results back.

Hand off to a peer agent

add_handoff(...) records a transfer marker on the shared context so the receiving agent's first turn is informed by what the previous agent did and why.

main.py
# In the intake agent, before transferring to billing
self.chat_context.add_handoff(
    to_agent="billing",
    from_agent="intake",
    reason="Caller wants to dispute a charge on order 456.",
)

The billing agent reads the handoff marker on takeover and can greet the caller with full context: "Hi, I see you'd like to dispute the charge on order 456 — let me pull that up."

Spin off a sub-agent (fork / merge)

For a focused side-task, fork a scoped context for a sub-agent, let it work, then merge its result back. The supervisor never loses its main context, and the sub-agent only sees what you choose to expose.

main.py
# Supervisor delegating refund-eligibility check
sub_ctx = self.chat_context.fork_brief(
    instructions="You check refund eligibility for one order.",
    recent_turns=2,
)

# ... sub-agent runs against sub_ctx ...

# Pull only the sub-agent's final answer back
await self.chat_context.merge_result(sub_ctx, agent_id="refund-checker")

Method	What it returns
`fork()`	A copy of the full context
`fork_filtered(...)`	A copy with only the items matching your filter
`fork_brief(instructions=..., recent_turns=N)`	Fresh context with custom instructions + the last N turns

Merge variant	What it merges back
`merge(other)`	Every item from the sub-agent's context (full audit trail)
`merge_result(other, agent_id=...)`	Only the sub-agent's final assistant message (cheapest)
`merge_with_summary(other, llm=..., agent_id=...)`	An LLM-generated summary of the sub-agent's work

Read-only view

ReadOnlyChatContext wraps a context so a sub-agent can read history but cannot mutate it — handy when you want strong isolation between agents.

main.py
from videosdk.agents import ReadOnlyChatContext

readonly = ReadOnlyChatContext(self.chat_context)
# Pass readonly to a sub-agent; any add_message / merge / truncate raises.

Realtime Tool-Call Recording

When a realtime model invokes a tool, the SDK records both the FunctionCall and the FunctionCallOutput on the agent's chat_context — dedup'd by call_id. This is what makes a realtime → cascade switch work: the cascade LLM on the other side of the switch reads the realtime model's tool calls and results as if it had made them itself, so it doesn't re-call lookup_order(456) it already has the answer to.

No configuration is needed — this happens automatically for every realtime provider listed above.

Reference

`Pipeline.change_pipeline(...)`

Swap one or more components on a live pipeline. The agent's chat_context is preserved across the switch.

Argument	Type	Purpose
`stt`	`STT \| None`	Replace the STT component (cascade mode)
`llm`	`LLM \| RealtimeBaseModel \| None`	Replace the LLM, or swap into a realtime model
`tts`	`TTS \| None`	Replace the TTS component
`vad`	`VAD \| None`	Replace the VAD component
`turn_detector`	`TurnDetector \| None`	Replace the turn detector

`ChatContext` core methods

Method	Purpose
`add_message(role, content, ...)`	Append a `ChatMessage`
`add_function_call(name, arguments, ...)`	Append a `FunctionCall`
`add_function_output(name, output, call_id, ...)`	Append a `FunctionCallOutput`
`add_handoff(to_agent, from_agent, reason)`	Mark an agent-to-agent transfer
`messages()`	Filter to `ChatMessage` items only
`turn_count()`	Count user turns
`estimated_tokens()`	Rough token estimate (~4 chars/token)
`fork()` / `fork_filtered(...)` / `fork_brief(...)`	Create a scoped copy for a sub-agent
`merge(other)` / `merge_result(...)` / `merge_with_summary(...)`	Merge a sub-agent's work back, in-place
`truncate(max_items=..., max_tokens=...)`	Manually trim while preserving system + summary items

Examples — Try Out Yourself

Cascade → Realtime Handoff

Start on cascade with tool calls, switch to a realtime model mid-call; chat_context carries across.

Realtime → Cascade Handoff

Start on a realtime model; switch to cascade. Realtime tool calls survive the switch.

Sequential Agent Handoff

Two peer agents share one ChatContext; intake records add_handoff and billing reads it on takeover.

Supervisor / Sub-Agent

Supervisor forks a scoped context for a refund-check sub-agent and merges the result back.

Got a Question? Ask us on discord

What ChatContext Records​

Mid-Call Pipeline Switching​

Idempotency​

Supported realtime providers​

Multi-Agent Context Patterns​

Hand off to a peer agent​

Spin off a sub-agent (fork / merge)​

Read-only view​

Realtime Tool-Call Recording​

Reference​

Pipeline.change_pipeline(...)​

ChatContext core methods​

Examples — Try Out Yourself​