Skip to main content
Version: 1.0.x

Memory

Give your AI agents the ability to remember past interactions and user preferences. By integrating a memory provider, your agent can move beyond the limits of its immediate context window to deliver truly personalized and context-aware conversations.

How Memory Enhances Conversations

A standard LLM's memory is limited to its context window. A dedicated memory provider solves this by creating a persistent, intelligent storage layer that recalls information across different sessions.

Memory-enabled Conversation Flow

The agent stores key facts (name, preferences, interests) and retrieves them in later conversations to provide a personalized interaction.

Implementation with Mem0

This guide demonstrates how to implement long-term memory using Mem0. We'll build a personal assistant that remembers returning users.

note

For a complete working example, see the GitHub repository:

Prerequisites

Step 1: Create a Memory Manager

Create a class that wraps the Mem0 REST API. It handles fetching, storing, and deciding what's worth remembering.

main.py
import httpx

class Mem0Memory:
STORE_KEYWORDS = (
"remember", "my name", "i like", "i dislike", "favorite",
"i prefer", "i love", "i hate", "i'm", "i am", "i work",
)

def __init__(self, api_key: str, user_id: str):
self.user_id = user_id
self._client = httpx.AsyncClient(
base_url="https://api.mem0.ai",
headers={"Authorization": f"Token {api_key}", "Content-Type": "application/json"},
timeout=10.0,
)

async def get_memories(self, limit: int = 5) -> list[str]:
"""Fetch all stored memories for this user."""
try:
r = await self._client.get("/v1/memories/", params={"user_id": self.user_id})
r.raise_for_status()
entries = r.json() if isinstance(r.json(), list) else r.json().get("results", [])
return [
e.get("memory", "")
for e in entries
if isinstance(e, dict) and e.get("memory", "").strip()
][:limit]
except Exception:
return []

async def search(self, query: str, top_k: int = 5) -> list[str]:
"""Search for memories relevant to the user's current query."""
try:
r = await self._client.post(
"/v1/memories/search/",
json={"query": query, "user_id": self.user_id, "top_k": top_k},
)
r.raise_for_status()
results = r.json() if isinstance(r.json(), list) else r.json().get("results", [])
return [e.get("memory", "") for e in results if isinstance(e, dict) and e.get("memory", "").strip()]
except Exception:
return []

async def store(self, user_msg: str, assistant_msg: str | None = None):
"""Store a conversation turn. Mem0 extracts what's worth remembering."""
messages = [{"role": "user", "content": user_msg}]
if assistant_msg:
messages.append({"role": "assistant", "content": assistant_msg})
r = await self._client.post(
"/v1/memories/", json={"messages": messages, "user_id": self.user_id}
)
r.raise_for_status()

Step 2: Create the Agent with Personalized Greeting

At session startup, fetch stored memories and inject them into the agent's system prompt. The agent greets returning users differently from new users.

main.py
from videosdk.agents import Agent

class PersonalAssistant(Agent):
def __init__(self, instructions: str, memories: list[str]):
self._memories = memories
super().__init__(instructions=instructions)

async def on_enter(self):
if self._memories:
await self.session.say("Hey! Welcome back. How can I help you today?")
else:
await self.session.say(
"Hi there! I'm your personal assistant. "
"Tell me about yourself so I can remember you next time!"
)

async def on_exit(self):
await self.session.say("Bye! I'll remember everything for next time.")

Step 3: Store Memories with Pipeline Hooks

Use two hooks together:

  • user_turn_start — Search Mem0 for relevant memories and inject them into chat_context before the LLM runs. This lets the agent reference past conversations.
  • llm — After the LLM responds, store the conversation turn. Mem0 automatically extracts what's worth remembering.
tip

Review core concepts in the Pipeline Hooks guide.

main.py
pending_msg = None

@pipeline.on("user_turn_start")
async def on_user(transcript: str):
nonlocal pending_msg
pending_msg = transcript

# Search for relevant past memories and inject into context
relevant = await memory.search(transcript)
if relevant:
context = "\n".join(f"- {m}" for m in relevant)
agent.chat_context.add_message(
role="system",
content=f"Relevant memories about this user:\n{context}\n\nUse these to answer personally.",
)

@pipeline.on("llm")
async def on_llm(data: dict):
nonlocal pending_msg
if not memory or not pending_msg:
pending_msg = None
return
# Store every turn — Mem0 decides what's worth remembering
await memory.store(pending_msg, data.get("text", ""))
pending_msg = None

Step 4: Wire Everything Together

Initialize the memory manager, build personalized instructions, create the pipeline, register hooks, and start the session.

main.py
import os
from videosdk.agents import Agent, AgentSession, Pipeline, WorkerJob, JobContext, RoomOptions
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector

async def entrypoint(ctx: JobContext):
# 1. Setup memory
memory = Mem0Memory(api_key=os.getenv("MEM0_API_KEY"), user_id="demo-user")
memories = await memory.get_memories()

# 2. Build personalized instructions
base = "You are a friendly personal assistant. Keep responses short and conversational."
if memories:
facts = "\n".join(f"- {m}" for m in memories)
instructions = f"{base}\n\nYou already know this about the user:\n{facts}"
else:
instructions = base

# 3. Create agent and pipeline
agent = PersonalAssistant(instructions=instructions, memories=memories)
pipeline = Pipeline(
stt=DeepgramSTT(),
llm=OpenAILLM(),
tts=ElevenLabsTTS(),
vad=SileroVAD(),
turn_detector=TurnDetector(),
)

# 4. Register memory hooks (see Step 3)
pending_msg = None

@pipeline.on("user_turn_start")
async def on_user(transcript: str):
nonlocal pending_msg
pending_msg = transcript
# Search and inject relevant memories before LLM runs
relevant = await memory.search(transcript)
if relevant:
context = "\n".join(f"- {m}" for m in relevant)
agent.chat_context.add_message(
role="system",
content=f"Relevant memories about this user:\n{context}\n\nUse these to answer personally.",
)

@pipeline.on("llm")
async def on_llm(data: dict):
nonlocal pending_msg
if not pending_msg:
return
# Store every turn — Mem0 decides what's worth remembering
await memory.store(pending_msg, data.get("text", ""))
pending_msg = None

# 5. Start session
session = AgentSession(agent=agent, pipeline=pipeline)
await session.start(wait_for_participant=True, run_until_shutdown=True)

def make_context() -> JobContext:
return JobContext(room_options=RoomOptions(name="Personal Assistant", playground=True))

if __name__ == "__main__":
WorkerJob(entrypoint=entrypoint, jobctx=make_context).start()

Example - Try It Yourself

Got a Question? Ask us on discord