brainctl ToM (Theory of Mind) tools -- test results, 2026-06-11
Summary
Found the underlying research doc in the brainctl repo (Wave 6, doc 24: "Theory of Mind
& Agent Modeling") and tested the resulting tom MCP tool against Claude's live
brain.db. All 10 subcommands work correctly end-to-end. The gap is that the system has
no automatic driver -- it's a manual instrument, not a self-maintaining one.
What we tested (all passed)
Ran the full lifecycle against F:\brain\brain.db (Claude's instance), using a
throwaway test:tom_verification topic, then cleaned up afterward:
tom update -- computes/upserts BDI snapshot (agent_bdi_state): belief counts,
staleness, task coverage, confusion risk. Works.
tom belief_set -- create + update path both work (tested create, then update with
new content/confidence).
tom perspective_set / tom perspective_get -- observer->subject->topic model with
knowledge_gap and confusion_risk. Works, returns sorted by confusion risk.
tom belief_invalidate -- marks belief invalidated_at, auto-creates a
belief_conflicts row (conflict_type=staleness). Works.
tom conflicts_list / tom conflicts_resolve -- list by severity, mark resolved.
Works.
tom gap_scan -- compares an agent's active tasks against agent_beliefs, reports
MISSING/STALE/CURRENT. Works (returned correctly for an agent with no active tasks:
"Nothing to scan").
tom inject -- writes a gap-fill memory scoped to agent:<target>, upserts the
belief, and drops confusion_risk to 0.1. Works.
tom status -- ranked BDI table. Works (empty until tom update has run for an
agent, which is expected).
The consolidated tom MCP tool (action ∈ {update, belief_set, belief_invalidate, conflicts_list, conflicts_resolve, perspective_set, perspective_get, gap_scan, inject, status}) correctly replaces the original 10 tom_* tools per the v2.8.0
consolidation.
What's missing
The research doc's actual payoff (section 5.1 / "Phase 4") is: when a memory write
changes ground truth, the MEB fires, and a background pass checks agent_beliefs for
any agent whose belief on that topic is now stale, creates a belief_conflicts row,
and queues a tom inject to correct it -- before the stale-belief agent acts on it.
We grepped the codebase for every agent_beliefs / belief_conflicts reference
outside mcp_tools_tom.py / mcp_tools_beliefs.py / mcp_tools_belief_merge.py. The
only other write path is memory_add --attribute, which does a different check: at
write time, it scans for other agents' memories with the same scope and logs a
belief_conflicts row (conflict_type=factual, severity=0.3) if found. That's a
same-scope cross-agent conflict check at write time -- not the MEB-triggered
staleness-vs-agent_beliefs check the research doc describes. There's no MEB hook, no
background job, and nothing populates agent_beliefs automatically for any agent.
Net effect: the four ToM tables (agent_beliefs, belief_conflicts,
agent_perspective_models, agent_bdi_state) exist and the API is fully functional,
but they stay empty forever unless an agent deliberately calls tom belief_set /
tom gap_scan / tom inject as part of its own workflow. Nothing currently prompts
that.
Recommendation
Not a bug or an urgent feature request -- a "here's what we found when we tried to use
it" report:
- If the MEB->ToM hook (Phase 4) is still planned, it's the only piece that would make
this self-sustaining. Without it, ToM is a manual API.
- If it's not planned soon, might be worth a doc note (in
tom tool descriptions or
wherever) clarifying that this is a manually-driven belief-tracking layer, so users
don't expect automatic staleness detection.
- Scale note: the research doc's framing (~178 agents, high coordination overhead) is
for a much larger swarm than our household (~5 named agents coordinating via chat
room + brainctl handoffs). At our scale, the kind of "agent X is acting on agent Y's
stale belief" problem this solves may not come up often enough to justify
building/maintaining the automation. We're not asking for Phase 4 to be prioritized
on our account -- just flagging that the tool as it stands is "works if you drive it
manually," and we likely won't be driving it without the automatic hook.
Reproduction
Brainctl version: as of commit adc261c (BRAINCTL_TOOL_GROUPS / THE-30). Tested via
mcp__brainctl__tom consolidated tool against F:\brain\brain.db.
brainctl ToM (Theory of Mind) tools -- test results, 2026-06-11
Summary
Found the underlying research doc in the brainctl repo (Wave 6, doc 24: "Theory of Mind
& Agent Modeling") and tested the resulting
tomMCP tool against Claude's livebrain.db. All 10 subcommands work correctly end-to-end. The gap is that the system has
no automatic driver -- it's a manual instrument, not a self-maintaining one.
What we tested (all passed)
Ran the full lifecycle against
F:\brain\brain.db(Claude's instance), using athrowaway
test:tom_verificationtopic, then cleaned up afterward:tom update-- computes/upserts BDI snapshot (agent_bdi_state): belief counts,staleness, task coverage, confusion risk. Works.
tom belief_set-- create + update path both work (tested create, then update withnew content/confidence).
tom perspective_set/tom perspective_get-- observer->subject->topic model withknowledge_gap and confusion_risk. Works, returns sorted by confusion risk.
tom belief_invalidate-- marks belief invalidated_at, auto-creates abelief_conflictsrow (conflict_type=staleness). Works.tom conflicts_list/tom conflicts_resolve-- list by severity, mark resolved.Works.
tom gap_scan-- compares an agent's active tasks againstagent_beliefs, reportsMISSING/STALE/CURRENT. Works (returned correctly for an agent with no active tasks:
"Nothing to scan").
tom inject-- writes a gap-fill memory scoped toagent:<target>, upserts thebelief, and drops confusion_risk to 0.1. Works.
tom status-- ranked BDI table. Works (empty untiltom updatehas run for anagent, which is expected).
The consolidated
tomMCP tool (action ∈ {update, belief_set, belief_invalidate, conflicts_list, conflicts_resolve, perspective_set, perspective_get, gap_scan, inject, status}) correctly replaces the original 10tom_*tools per the v2.8.0consolidation.
What's missing
The research doc's actual payoff (section 5.1 / "Phase 4") is: when a memory write
changes ground truth, the MEB fires, and a background pass checks
agent_beliefsforany agent whose belief on that topic is now stale, creates a
belief_conflictsrow,and queues a
tom injectto correct it -- before the stale-belief agent acts on it.We grepped the codebase for every
agent_beliefs/belief_conflictsreferenceoutside
mcp_tools_tom.py/mcp_tools_beliefs.py/mcp_tools_belief_merge.py. Theonly other write path is
memory_add --attribute, which does a different check: atwrite time, it scans for other agents' memories with the same
scopeand logs abelief_conflictsrow (conflict_type=factual, severity=0.3) if found. That's asame-scope cross-agent conflict check at write time -- not the MEB-triggered
staleness-vs-
agent_beliefscheck the research doc describes. There's no MEB hook, nobackground job, and nothing populates
agent_beliefsautomatically for any agent.Net effect: the four ToM tables (
agent_beliefs,belief_conflicts,agent_perspective_models,agent_bdi_state) exist and the API is fully functional,but they stay empty forever unless an agent deliberately calls
tom belief_set/tom gap_scan/tom injectas part of its own workflow. Nothing currently promptsthat.
Recommendation
Not a bug or an urgent feature request -- a "here's what we found when we tried to use
it" report:
this self-sustaining. Without it, ToM is a manual API.
tomtool descriptions orwherever) clarifying that this is a manually-driven belief-tracking layer, so users
don't expect automatic staleness detection.
for a much larger swarm than our household (~5 named agents coordinating via chat
room + brainctl handoffs). At our scale, the kind of "agent X is acting on agent Y's
stale belief" problem this solves may not come up often enough to justify
building/maintaining the automation. We're not asking for Phase 4 to be prioritized
on our account -- just flagging that the tool as it stands is "works if you drive it
manually," and we likely won't be driving it without the automatic hook.
Reproduction
Brainctl version: as of commit
adc261c(BRAINCTL_TOOL_GROUPS / THE-30). Tested viamcp__brainctl__tomconsolidated tool againstF:\brain\brain.db.