Suggestion: Add reproducible memory evaluation benchmarks

Hi team, congrats on the open-source release! The layered L0→L3 architecture is really interesting.

I noticed your benchmarks show impressive improvements (48%→76% on PersonaMem accuracy). For the community to reproduce and build on these results, would you consider adding a standardized evaluation module?

I've been working on [MemTest](https://github.com/yubingz/memtest), a benchmark database design system for AI memory evaluation. It takes a different approach from typical eval frameworks — instead of providing evaluation metrics, it provides **test databases** that stress-test different aspects of memory retrieval:

- **Storage integrity**: Can the system store and retrieve all variants of a memory?
- **Retrieval precision**: 5 query types (person/location/event/time/composite) with temporal reasoning
- **Memory clustering**: Are related memories grouped together?
- **Forgetting directionality**: High-frequency vs low-frequency recall
- **Reasoning**: Multi-hop chain queries across memories
- **Deep retrieval**: Recall decay over near/mid/far semantic distance

It also includes a corpus-driven builder that generates test databases from any text corpus. We used it with the Four Great Classical Novels of Chinese literature to generate 21,793 memories + 750 queries.

For a layered architecture like yours, this could be especially useful for evaluating whether L0/L1/L2/L3 each retrieve from the correct layer under different query patterns.

Would this kind of standardized benchmarking be useful for your project? Happy to help draft a starter integration.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Add reproducible memory evaluation benchmarks #106

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Suggestion: Add reproducible memory evaluation benchmarks #106

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions