Skip to content

ToM tools tested working -- Phase 4 auto-staleness hook not implemented #156

@crystalwizard

Description

@crystalwizard

brainctl ToM (Theory of Mind) tools -- test results, 2026-06-11

Summary

Found the underlying research doc in the brainctl repo (Wave 6, doc 24: "Theory of Mind
& Agent Modeling") and tested the resulting tom MCP tool against Claude's live
brain.db. All 10 subcommands work correctly end-to-end. The gap is that the system has
no automatic driver -- it's a manual instrument, not a self-maintaining one.

What we tested (all passed)

Ran the full lifecycle against F:\brain\brain.db (Claude's instance), using a
throwaway test:tom_verification topic, then cleaned up afterward:

  • tom update -- computes/upserts BDI snapshot (agent_bdi_state): belief counts,
    staleness, task coverage, confusion risk. Works.
  • tom belief_set -- create + update path both work (tested create, then update with
    new content/confidence).
  • tom perspective_set / tom perspective_get -- observer->subject->topic model with
    knowledge_gap and confusion_risk. Works, returns sorted by confusion risk.
  • tom belief_invalidate -- marks belief invalidated_at, auto-creates a
    belief_conflicts row (conflict_type=staleness). Works.
  • tom conflicts_list / tom conflicts_resolve -- list by severity, mark resolved.
    Works.
  • tom gap_scan -- compares an agent's active tasks against agent_beliefs, reports
    MISSING/STALE/CURRENT. Works (returned correctly for an agent with no active tasks:
    "Nothing to scan").
  • tom inject -- writes a gap-fill memory scoped to agent:<target>, upserts the
    belief, and drops confusion_risk to 0.1. Works.
  • tom status -- ranked BDI table. Works (empty until tom update has run for an
    agent, which is expected).

The consolidated tom MCP tool (action ∈ {update, belief_set, belief_invalidate, conflicts_list, conflicts_resolve, perspective_set, perspective_get, gap_scan, inject, status}) correctly replaces the original 10 tom_* tools per the v2.8.0
consolidation.

What's missing

The research doc's actual payoff (section 5.1 / "Phase 4") is: when a memory write
changes ground truth, the MEB fires, and a background pass checks agent_beliefs for
any agent whose belief on that topic is now stale, creates a belief_conflicts row,
and queues a tom inject to correct it -- before the stale-belief agent acts on it.

We grepped the codebase for every agent_beliefs / belief_conflicts reference
outside mcp_tools_tom.py / mcp_tools_beliefs.py / mcp_tools_belief_merge.py. The
only other write path is memory_add --attribute, which does a different check: at
write time, it scans for other agents' memories with the same scope and logs a
belief_conflicts row (conflict_type=factual, severity=0.3) if found. That's a
same-scope cross-agent conflict check at write time -- not the MEB-triggered
staleness-vs-agent_beliefs check the research doc describes. There's no MEB hook, no
background job, and nothing populates agent_beliefs automatically for any agent.

Net effect: the four ToM tables (agent_beliefs, belief_conflicts,
agent_perspective_models, agent_bdi_state) exist and the API is fully functional,
but they stay empty forever unless an agent deliberately calls tom belief_set /
tom gap_scan / tom inject as part of its own workflow. Nothing currently prompts
that.

Recommendation

Not a bug or an urgent feature request -- a "here's what we found when we tried to use
it" report:

  1. If the MEB->ToM hook (Phase 4) is still planned, it's the only piece that would make
    this self-sustaining. Without it, ToM is a manual API.
  2. If it's not planned soon, might be worth a doc note (in tom tool descriptions or
    wherever) clarifying that this is a manually-driven belief-tracking layer, so users
    don't expect automatic staleness detection.
  3. Scale note: the research doc's framing (~178 agents, high coordination overhead) is
    for a much larger swarm than our household (~5 named agents coordinating via chat
    room + brainctl handoffs). At our scale, the kind of "agent X is acting on agent Y's
    stale belief" problem this solves may not come up often enough to justify
    building/maintaining the automation. We're not asking for Phase 4 to be prioritized
    on our account -- just flagging that the tool as it stands is "works if you drive it
    manually," and we likely won't be driving it without the automatic hook.

Reproduction

Brainctl version: as of commit adc261c (BRAINCTL_TOOL_GROUPS / THE-30). Tested via
mcp__brainctl__tom consolidated tool against F:\brain\brain.db.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions