Graph-based Agent Memory

Research summary — May 29, 2026. Sources: arXiv:2602.05665, arXiv:2604.12285, arXiv:2601.03236

Why Graph-based Agent Memory?

Traditional LLM agents hit three fundamental walls: knowledge cutoff (static training data), tool incompetence (can’t efficiently learn new tools at runtime), and performance saturation (agents repeat the same errors because they can’t accumulate experience). Memory solves this — but how you store memory matters enormously.

Simple memory approaches (flat token windows, vector databases, key-value stores) can retrieve semantically similar content but struggle with relational reasoning — questions like “what changed between these two events?” or “how does concept A relate to concept B via C?” Graphs solve this by natively encoding relationships, hierarchy, and causality.

Memory Taxonomy

By time horizon: Short-term (volatile, current session context) vs. Long-term (persists across sessions, accumulated knowledge and preferences).

By cognitive type:

Semantic memory — decontextualized world knowledge
Episodic memory — chronological past sessions, autobiographical history
Procedural memory — skills, routines, rules
Working memory — agent’s scratchpad for current reasoning
Associative memory — latent links between concepts
Sentiment memory — emotional tone/feedback from interactions

By content type:

Knowledge memory — passive, static, objective facts
Experience memory — dynamic personal logbook of interactions, outcomes, failures

The Memory Lifecycle

1. Extraction — Raw observations → structured memory units. LLM-based entity/relation extraction produces (subject, relation, object) triples from text, trajectories, or multimodal data.

2. Storage — Four main graph shapes:

Structure	Best for
Knowledge Graph	Semantic facts, stable entity relationships
Temporal Graph	Time-ordered events, how facts change over time
Hierarchical	Multi-level abstraction (topic → event → utterance)
Hypergraph	N-ary relations beyond simple subject-predicate-object
Hybrid	Combining the above — used in production systems

3. Retrieval — Six operator types (roughly in order of sophistication): similarity-based, rule-based (Cypher/SPARQL), temporal-based, graph traversal, RL-based, agent-based. Production systems like Graphiti use triple-modality hybrid retrieval: BM25 + cosine similarity + graph traversal — 94.8% on the Deep Memory Retrieval benchmark.

4. Evolution — Three self-evolution mechanisms:

Consolidation — merging redundant/conflicting nodes
Graph reasoning — inferring new edges from existing ones
Graph reorganization — restructuring as topics shift

The Core Engineering Problem: Stability vs. Plasticity

Unified stream systems (MemGPT, Mem0, A-Mem) update memory continuously — fast to write, but suffer memory contamination: transient noisy dialogue corrupts stable long-term knowledge. Topics bleed into each other (semantic drift) and established facts become isolated and forgotten (memory loss).

Discrete structured systems (GraphRAG, LightRAG) are stable but rigid — great at static fact retrieval, poor at tracking evolving narratives in real-time.

GAM’s Solution: Hierarchical Decoupling with Semantic-Triggered Consolidation

(arXiv:2604.12285, Apr 2026 — outperforms SOTA on LoCoMo and LongDialQA)

Two separate graph layers:

Event Progression Graph (local, fast, append-only) — captures real-time dialogue as atomic event nodes
Topic Associative Network (global, stable) — consolidated semantic themes

Transition is triggered by semantic divergence detection — an LLM checks whether accumulated local context has drifted enough from the global state to justify consolidation. Only at semantic boundaries does the local buffer get summarized and merged into the global graph (inspired by sleep-dependent memory consolidation).

Retrieval does a top-down traversal: semantic anchors in the global topic network → follow cross-layer links → drill into archived event graphs for precise episodic details. Multi-factor reranking applies temporal recency, confidence scores, and speaker role signals.

Production-Ready Systems

System	Key Features	Benchmark
Graphiti / Zep	Temporal KG on Neo4j, bitemporal versioning, BM25+vector+graph hybrid retrieval	94.8% DMR, +18.5% LongMemEval
MAGMA (arXiv:2601.03236)	Decouples memory representation from retrieval logic, transparent reasoning paths	SOTA on long-horizon reasoning
Mem0	Most widely deployed; hierarchical OS-like tiers; lacks write isolation	—

Engineering Takeaways

Vector-only RAG is insufficient for agents that need relational reasoning across sessions or events.
The GAM buffering pattern (fast local append → semantic-boundary consolidation → stable global graph) is implementable with Neo4j or FalkorDB.
Hybrid retrieval (BM25 + vector + graph) beats any single modality — don’t pick just one.
Memory types are orthogonal to storage architecture — but graphs give multi-hop relational reasoning for free.
If building multi-session conversational agents, plan for both episodic and semantic memory layers from the start.

Graph-based Agent Memory#

Why Graph-based Agent Memory?#

Memory Taxonomy#

The Memory Lifecycle#

The Core Engineering Problem: Stability vs. Plasticity#

GAM’s Solution: Hierarchical Decoupling with Semantic-Triggered Consolidation#

Production-Ready Systems#

Engineering Takeaways#

Resources#