Skip to content

Architecture

mrdulasolutions edited this page May 25, 2026 · 1 revision

Architecture

The design rationale behind box-memory.

The problem we're solving

AI agents need persistent memory + persistent file storage. Today's options:

  • Vector-store / RAG tools (Mem0, Supermemory, ChatGPT Memories) chunk files into embeddings, lose provenance, drift across model versions, treat files as second-class metadata behind chunks. Wrong for anything regulated.
  • Markdown-vault tools (Obsidian + iCloud / Git) keep files whole. Single-user, no compliance, no ACLs.

box-memory picks Camp B's data model (whole files, exact recall) and runs it on Box's substrate (compliance, ACLs, retention, durability). The plugin layers the agent-memory primitives — schema, index, companions, recall — on top.

Why Box specifically

Box is the only major file storage that's:

  • Certified for regulated workloads — SOC 2 on Business+, HIPAA BAA + FedRAMP Moderate on Enterprise, FedRAMP High + DoD IL4 + ITAR on Enterprise Plus / Government
  • API-first with immutable file IDs — Box file IDs never change across renames, moves, or version updates. The plugin keys everything on these IDs.
  • Native CAD / Office / image / PDF previews — agents can show "this is the file" without a separate viewer service
  • Folder ACLs that actually work — multi-team isolation is a folder permission, not an application convention
  • Real Box AI — Ask, Extract, AI Studio, Hubs Q&A. Server-side, tier-gated, but high-quality.
  • Working official MCP at mcp.box.com — Claude can talk to it natively

What Box is bad at: free-text search. Search API has a ~10 minute indexing lag. This is the central design constraint of box-memory. The plugin's index files exist specifically to route around it.

Core design decisions

1. Box file IDs are the only reliable identifier

Filenames change. Wikilinks rot. Slugs collide. Box file IDs are permanent.

The plugin uses two primary keys:

  • memory ID (mem_<ulid>) — agent-facing
  • Box file ID — substrate-facing

The per-folder _index.json maps between them.

2. Append-only memories

Agents don't overwrite. To change a position, write a new memory, mark the old status: superseded with superseded_by: <new-id>. History is free; matches how humans think.

3. The index-file pattern — bypasses Box's search lag

Box search has a 10-minute indexing lag. Every memory write updates a per-folder _index.json so recall is instant. Read order:

  1. Known file ID → direct fetch
  2. Known folder + slug/title/kind/tag → _index.json
  3. Cross-folder → workspace-root rollup _index.json
  4. Business+ → Metadata Query via search_files_keyword + mdfilters
  5. Last resort → Box Search (with a stale-result warning)

4. Companions instead of chunking

Binaries don't get chunked, embedded, or indexed. They get a paired companion .md written by the agent that last reviewed them.

report.pdf              ← binary, never touched
report.pdf.md           ← companion, agent-written, describes the binary

Companion frontmatter pins the description to a specific version via companion_for.sha256. If the binary changes, the companion is stale — the plugin detects this.

On Business+, Box AI Extract Structured runs OCR for PDFs / TIFF / PNG / JPEG and fills companion fields automatically. On Personal, agents do their best with what they can locally read.

5. Tier-aware, never tier-gated

Every feature has a working path on every Box tier — Business+ just gets faster paths. Personal users still get full memory + recall + companion functionality via the index pattern.

6. Folder ACLs are the multi-team boundary

Frontmatter team: is a hint. Folder ACLs are enforcement. The plugin creates the folder structure; you set ACLs in Box's web UI.

7. Box MCP is the auth boundary

The plugin doesn't manage Box authentication. It invokes the Box MCP tools the user has connected — typically the official remote MCP at mcp.box.com. If your agent acts as Alice, the plugin acts as Alice. Folder ACLs are enforced naturally.

What's deliberately out of scope

  • Embeddings, RAG, vector stores — Box AI Hubs Q&A does this server-side for us when needed. We don't ship our own.
  • Box auth flows — handled by the user's Box MCP.
  • Sync to other vaults — Box is the system of record.
  • Versioning UI — Box has native version history.
  • Box Skills (the framework) — graveyard per Operational Notes.

Two variants

Variant Repo Backend Best for
box-memory (this) mrdulasolutions/BOX Box MCP / network Any device with internet; full Box AI access
box-memory-onprem mrdulasolutions/BOX-Onprem Local Box Drive filesystem Zero outbound calls to Box; HIPAA / FedRAMP / ITAR data path

Same workspace schema. A workspace created with one can be opened by the other.

Failure modes the plugin handles

Failure Plugin response
Box MCP not connected Clear error directing to Settings → Connectors → Box at mcp.box.com
Filename collision (409) Append -2, -3 etc. to filename; slug stable; index reflects
Index drift box-index-rebuild regenerates from source
Wikilink dangling Treated as forward reference; not an error
Companion hash mismatch Detected on read; mark stale; offer to regenerate
Search returns stale Index lookup is primary; search is fallback
Tier downgrade Detect on next write; switch to index-only
Stale OAuth after tier upgrade Operational Notes Note 2 — reconnect MCP
Hub indexing warm-up Operational Notes Note 7 — fall through to file-set Q&A
search_files_metadata broken Operational Notes Note 1 — use search_files_keyword + mdfilters instead

See also

Clone this wiki locally