Exabase Achieves Highest Reported Score on Leading AI Memory Benchmark Using a Smaller, Cheaper Model

PR Newswire
Today at 12:38pm UTC

Exabase Achieves Highest Reported Score on Leading AI Memory Benchmark Using a Smaller, Cheaper Model

PR Newswire

Exabase's new memory engine M-1 reaches 96.4% on LongMemEval with Gemini 3 Flash, outperforming all systems that used the larger Gemini 3 Pro

LONDON, May 26, 2026 /PRNewswire/ -- As AI agents move from experiments to production systems, long-term memory has emerged as a critical infrastructure challenge. Existing approaches often rely on large, expensive models to compensate for weak retrieval, creating systems that perform well on benchmarks but are impractical to deploy at scale.

Exabase, a data infrastructure platform for AI agents, today announced that its memory engine M-1 has achieved the highest reported score on LongMemEval, the leading public benchmark for conversational AI memory. M-1 scored 96.4% accuracy at top-50 retrieval, surpassing every previously reported system, including Mem0 (94.8%), Honcho (92.6%), HydraDB (90.79%), and Supermemory (85.2%).

Notably, M-1 achieved this using Gemini 3 Flash, a model that is four to six times cheaper, and faster than Gemini 3 Pro, which was used by every other leading system on the benchmark.

"M-1 was designed for production from the start. The memory architecture does the heavy lifting, which means you get better results with a cheaper, faster model," said Jonathan Bree, founder of Exabase. "That's what makes the difference between a benchmark result and a production system."

LongMemEval, developed by Wang et al. (2024), evaluates systems across 500 questions and approximately 115,000 tokens of conversational history, testing six capabilities: recalling user facts, recalling preferences, recalling assistant-provided information, multi-session reasoning, temporal reasoning, and knowledge updates. It has become the standard benchmark for evaluating AI memory systems.

M-1's retrieval architecture was developed in collaboration with Hyperplane Labs (https://hyperplanelabs.ai), a European applied research laboratory focused on cognitive AI architectures. The system draws on principles from episodic memory theory, reconstructive recall, and temporal context models, treating memory as a reconstructive process rather than a keyword search.

The evaluation used a single generic prompt across all 500 questions, with full methodology and results publicly available.

M-1 is deployed in production, powering memory and search in products like Fabric (https://fabric.so), an AI workspace and personal data platform with over 300,000 users. The memory API is available to developers through the Exabase platform (https://exabase.io).

The full research paper, comparative results, and downloadable data are available at: https://exabase.io/research/exabase-achieves-state-of-the-art-on-longmemeval-benchmark

About Exabase
Exabase is a data infrastructure platform for AI agents, providing memory, storage, search, and extraction capabilities. Built from the infrastructure behind Fabric, Exabase enables developers to build agents that persist, learn, and operate across sessions. Learn more at https://exabase.io.

About Hyperplane Labs
Hyperplane Labs is a European applied research laboratory focused on cognitive AI architectures. Learn more at https://hyperplanelabs.ai.

About Fabric
Fabric is a personal AI workspace and cloud computer where everything you save, write, capture, or record lives in one intelligent space. Learn more at https://fabric.so.

Contact:
Jonathan Bree
Founder, Exabase
415153@email4pr.com
+1 302-628-6654

Cision View original content to download multimedia:https://www.prnewswire.com/news-releases/exabase-achieves-highest-reported-score-on-leading-ai-memory-benchmark-using-a-smaller-cheaper-model-302780919.html

SOURCE Exabase