Graph-based Agent Memory

Research summary — May 29, 2026. Sources: arXiv:2602.05665, arXiv:2604.12285, arXiv:2601.03236


Why Graph-based Agent Memory?

Traditional LLM agents hit three fundamental walls: knowledge cutoff (static training data), tool incompetence (can’t efficiently learn new tools at runtime), and performance saturation (agents repeat the same errors because they can’t accumulate experience). Memory solves this — but how you store memory matters enormously.

Simple memory approaches (flat token windows, vector databases, key-value stores) can retrieve semantically similar content but struggle with relational reasoning — questions like “what changed between these two events?” or “how does concept A relate to concept B via C?” Graphs solve this by natively encoding relationships, hierarchy, and causality.


Memory Taxonomy

By time horizon: Short-term (volatile, current session context) vs. Long-term (persists across sessions, accumulated knowledge and preferences).

By cognitive type:

  • Semantic memory — decontextualized world knowledge
  • Episodic memory — chronological past sessions, autobiographical history
  • Procedural memory — skills, routines, rules
  • Working memory — agent’s scratchpad for current reasoning
  • Associative memory — latent links between concepts
  • Sentiment memory — emotional tone/feedback from interactions

By content type:

  • Knowledge memory — passive, static, objective facts
  • Experience memory — dynamic personal logbook of interactions, outcomes, failures

The Memory Lifecycle

1. Extraction — Raw observations → structured memory units. LLM-based entity/relation extraction produces (subject, relation, object) triples from text, trajectories, or multimodal data.

2. Storage — Four main graph shapes:

StructureBest for
Knowledge GraphSemantic facts, stable entity relationships
Temporal GraphTime-ordered events, how facts change over time
HierarchicalMulti-level abstraction (topic → event → utterance)
HypergraphN-ary relations beyond simple subject-predicate-object
HybridCombining the above — used in production systems

3. Retrieval — Six operator types (roughly in order of sophistication): similarity-based, rule-based (Cypher/SPARQL), temporal-based, graph traversal, RL-based, agent-based. Production systems like Graphiti use triple-modality hybrid retrieval: BM25 + cosine similarity + graph traversal — 94.8% on the Deep Memory Retrieval benchmark.

4. Evolution — Three self-evolution mechanisms:

  • Consolidation — merging redundant/conflicting nodes
  • Graph reasoning — inferring new edges from existing ones
  • Graph reorganization — restructuring as topics shift

The Core Engineering Problem: Stability vs. Plasticity

Unified stream systems (MemGPT, Mem0, A-Mem) update memory continuously — fast to write, but suffer memory contamination: transient noisy dialogue corrupts stable long-term knowledge. Topics bleed into each other (semantic drift) and established facts become isolated and forgotten (memory loss).

Discrete structured systems (GraphRAG, LightRAG) are stable but rigid — great at static fact retrieval, poor at tracking evolving narratives in real-time.

GAM’s Solution: Hierarchical Decoupling with Semantic-Triggered Consolidation

(arXiv:2604.12285, Apr 2026 — outperforms SOTA on LoCoMo and LongDialQA)

Two separate graph layers:

  • Event Progression Graph (local, fast, append-only) — captures real-time dialogue as atomic event nodes
  • Topic Associative Network (global, stable) — consolidated semantic themes

Transition is triggered by semantic divergence detection — an LLM checks whether accumulated local context has drifted enough from the global state to justify consolidation. Only at semantic boundaries does the local buffer get summarized and merged into the global graph (inspired by sleep-dependent memory consolidation).

Retrieval does a top-down traversal: semantic anchors in the global topic network → follow cross-layer links → drill into archived event graphs for precise episodic details. Multi-factor reranking applies temporal recency, confidence scores, and speaker role signals.


Production-Ready Systems

SystemKey FeaturesBenchmark
Graphiti / ZepTemporal KG on Neo4j, bitemporal versioning, BM25+vector+graph hybrid retrieval94.8% DMR, +18.5% LongMemEval
MAGMA (arXiv:2601.03236)Decouples memory representation from retrieval logic, transparent reasoning pathsSOTA on long-horizon reasoning
Mem0Most widely deployed; hierarchical OS-like tiers; lacks write isolation

Engineering Takeaways

  • Vector-only RAG is insufficient for agents that need relational reasoning across sessions or events.
  • The GAM buffering pattern (fast local append → semantic-boundary consolidation → stable global graph) is implementable with Neo4j or FalkorDB.
  • Hybrid retrieval (BM25 + vector + graph) beats any single modality — don’t pick just one.
  • Memory types are orthogonal to storage architecture — but graphs give multi-hop relational reasoning for free.
  • If building multi-session conversational agents, plan for both episodic and semantic memory layers from the start.

Resources