Memory
Give your AI agents the ability to remember past interactions and user preferences. By integrating a memory provider, your agent can move beyond the limits of its immediate context window to deliver truly personalized and context-aware conversations.
How Memory Enhances Conversations
A standard LLM's memory is limited to its context window. A dedicated memory provider solves this by creating a persistent, intelligent storage layer that recalls information across different sessions.

The agent stores key facts (name, preferences, interests) and retrieves them in later conversations to provide a personalized interaction.
Implementation with Mem0
This guide demonstrates how to implement long-term memory using Mem0. We'll build a personal assistant that remembers returning users.
For a complete working example, see the GitHub repository:
Prerequisites
- A Mem0 API key from the Mem0 dashboard.
- Agent environment set up per the AI Voice Agent Quickstart.
Step 1: Create a Memory Manager
Create a class that wraps the Mem0 REST API. It handles fetching, storing, and deciding what's worth remembering.
import httpx
class Mem0Memory:
STORE_KEYWORDS = (
"remember", "my name", "i like", "i dislike", "favorite",
"i prefer", "i love", "i hate", "i'm", "i am", "i work",
)
def __init__(self, api_key: str, user_id: str):
self.user_id = user_id
self._client = httpx.AsyncClient(
base_url="https://api.mem0.ai",
headers={"Authorization": f"Token {api_key}", "Content-Type": "application/json"},
timeout=10.0,
)
async def get_memories(self, limit: int = 5) -> list[str]:
"""Fetch all stored memories for this user."""
try:
r = await self._client.get("/v1/memories/", params={"user_id": self.user_id})
r.raise_for_status()
entries = r.json() if isinstance(r.json(), list) else r.json().get("results", [])
return [
e.get("memory", "")
for e in entries
if isinstance(e, dict) and e.get("memory", "").strip()
][:limit]
except Exception:
return []
async def search(self, query: str, top_k: int = 5) -> list[str]:
"""Search for memories relevant to the user's current query."""
try:
r = await self._client.post(
"/v1/memories/search/",
json={"query": query, "user_id": self.user_id, "top_k": top_k},
)
r.raise_for_status()
results = r.json() if isinstance(r.json(), list) else r.json().get("results", [])
return [e.get("memory", "") for e in results if isinstance(e, dict) and e.get("memory", "").strip()]
except Exception:
return []
async def store(self, user_msg: str, assistant_msg: str | None = None):
"""Store a conversation turn. Mem0 extracts what's worth remembering."""
messages = [{"role": "user", "content": user_msg}]
if assistant_msg:
messages.append({"role": "assistant", "content": assistant_msg})
r = await self._client.post(
"/v1/memories/", json={"messages": messages, "user_id": self.user_id}
)
r.raise_for_status()
Step 2: Create the Agent with Personalized Greeting
At session startup, fetch stored memories and inject them into the agent's system prompt. The agent greets returning users differently from new users.
from videosdk.agents import Agent
class PersonalAssistant(Agent):
def __init__(self, instructions: str, memories: list[str]):
self._memories = memories
super().__init__(instructions=instructions)
async def on_enter(self):
if self._memories:
await self.session.say("Hey! Welcome back. How can I help you today?")
else:
await self.session.say(
"Hi there! I'm your personal assistant. "
"Tell me about yourself so I can remember you next time!"
)
async def on_exit(self):
await self.session.say("Bye! I'll remember everything for next time.")
Step 3: Store Memories with Pipeline Hooks
Use two hooks together:
user_turn_start— Search Mem0 for relevant memories and inject them intochat_contextbefore the LLM runs. This lets the agent reference past conversations.llm— After the LLM responds, store the conversation turn. Mem0 automatically extracts what's worth remembering.
Review core concepts in the Pipeline Hooks guide.
pending_msg = None
@pipeline.on("user_turn_start")
async def on_user(transcript: str):
nonlocal pending_msg
pending_msg = transcript
# Search for relevant past memories and inject into context
relevant = await memory.search(transcript)
if relevant:
context = "\n".join(f"- {m}" for m in relevant)
agent.chat_context.add_message(
role="system",
content=f"Relevant memories about this user:\n{context}\n\nUse these to answer personally.",
)
@pipeline.on("llm")
async def on_llm(data: dict):
nonlocal pending_msg
if not memory or not pending_msg:
pending_msg = None
return
# Store every turn — Mem0 decides what's worth remembering
await memory.store(pending_msg, data.get("text", ""))
pending_msg = None
Step 4: Wire Everything Together
Initialize the memory manager, build personalized instructions, create the pipeline, register hooks, and start the session.
import os
from videosdk.agents import Agent, AgentSession, Pipeline, WorkerJob, JobContext, RoomOptions
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector
async def entrypoint(ctx: JobContext):
# 1. Setup memory
memory = Mem0Memory(api_key=os.getenv("MEM0_API_KEY"), user_id="demo-user")
memories = await memory.get_memories()
# 2. Build personalized instructions
base = "You are a friendly personal assistant. Keep responses short and conversational."
if memories:
facts = "\n".join(f"- {m}" for m in memories)
instructions = f"{base}\n\nYou already know this about the user:\n{facts}"
else:
instructions = base
# 3. Create agent and pipeline
agent = PersonalAssistant(instructions=instructions, memories=memories)
pipeline = Pipeline(
stt=DeepgramSTT(),
llm=OpenAILLM(),
tts=ElevenLabsTTS(),
vad=SileroVAD(),
turn_detector=TurnDetector(),
)
# 4. Register memory hooks (see Step 3)
pending_msg = None
@pipeline.on("user_turn_start")
async def on_user(transcript: str):
nonlocal pending_msg
pending_msg = transcript
# Search and inject relevant memories before LLM runs
relevant = await memory.search(transcript)
if relevant:
context = "\n".join(f"- {m}" for m in relevant)
agent.chat_context.add_message(
role="system",
content=f"Relevant memories about this user:\n{context}\n\nUse these to answer personally.",
)
@pipeline.on("llm")
async def on_llm(data: dict):
nonlocal pending_msg
if not pending_msg:
return
# Store every turn — Mem0 decides what's worth remembering
await memory.store(pending_msg, data.get("text", ""))
pending_msg = None
# 5. Start session
session = AgentSession(agent=agent, pipeline=pipeline)
await session.start(wait_for_participant=True, run_until_shutdown=True)
def make_context() -> JobContext:
return JobContext(room_options=RoomOptions(name="Personal Assistant", playground=True))
if __name__ == "__main__":
WorkerJob(entrypoint=entrypoint, jobctx=make_context).start()
Example - Try It Yourself
Got a Question? Ask us on discord

