docs: mark v0.25.0 as the latest release on the landing page by dfrostar · Pull Request #231 · dfrostar/neuralmind

dfrostar · 2026-06-12T05:51:17Z

v0.25.0 shipped, so the landing page's "latest" markers move forward, mirroring what #224 did for v0.23.0:

Hero badge: v0.24.0 → v0.25.0 (and its release-notes link)
JSON-LD softwareVersion: 0.24.0 → 0.25.0
Timeline: v0.25.0 takes the "Latest release" pill (was "In development"); v0.24.0 joins the plain history trail

No other content changes.

https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547

v0.25.0 shipped, so the hero badge, JSON-LD softwareVersion, and the timeline pills move forward: v0.25.0 takes the "Latest release" pill and v0.24.0 joins the plain history trail. https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547

github-actions · 2026-06-12T05:52:33Z

Backend parity gate — graphify vs built-in tree-sitter

✅ PASS — the built-in backend must stay within tolerance of graphify on the reference fixture.

Metric	graphify	built-in
code nodes	65	79
mean reduction	6.05×	6.66×
faithfulness delta	-0.047	+0.143
fact recall	0.527	0.717
grounding	0.889	1.000

Gate checks

✅ reduction within tolerance of graphify — built-in 6.66× ≥ 4.54× (graphify 6.05× − 25%)
✅ reduction ≥ absolute floor — built-in 6.66× ≥ floor 4.00×
✅ faithfulness delta within tolerance of graphify — built-in +0.143 ≥ -0.147 (graphify -0.047 − 0.10)
✅ faithfulness delta ≥ absolute floor — built-in +0.143 ≥ floor +0.000
✅ fact recall within tolerance of graphify — built-in 0.717 ≥ 0.427 (graphify 0.527 − 0.10)

Tolerances: reduction within 25% (floor 4.0×), faithfulness within 0.10 (floor +0.00). Override via NEURALMIND_PARITY_* env vars.

Automated by evals/parity/run.py — reproduce locally with python -m evals.parity.run.

Multi-language structural parity

Language	graphify symbols	built-in covers	dangling
typescript	54	54 (100%)	0
go	45	45 (100%)	0

✅ typescript: symbol coverage ≥ floor — 54/54 graphify symbols (100%) ≥ 90%
✅ typescript: no dangling edges — 0 dangling edge(s)
✅ go: symbol coverage ≥ floor — 45/45 graphify symbols (100%) ≥ 90%
✅ go: no dangling edges — 0 dangling edge(s)

Coverage floor: 90% of graphify's per-language symbols (no gold-fact set exists for TS/Go, so parity is structural).

Optional SCIP precision pass

✅ precision: SCIP corrects the heuristic call edge — run() → A.handle under SCIP (heuristic wrongly linked B.handle)
✅ precision: strict no-op when disabled — graph unchanged when NEURALMIND_PRECISION is unset

Off by default (NEURALMIND_PRECISION); proven on tests/fixtures/scip_precision to replace a heuristic call edge with the compiler-accurate one a SCIP index resolves.

github-actions · 2026-06-12T05:53:41Z

NeuralMind self-benchmark

Status: PASS — floor 4×, measured 6.2×.

Phase 1 — Reduction on committed fixture

Average reduction: 6.2×
Top-k retrieval hit rate: 71.7%
Naive baseline: 47,360 tokens (all fixture files concatenated)
NeuralMind total: 7,706 tokens across 10 queries
Estimated monthly savings @ 100 queries/day on Claude 3.5 Sonnet: ~$35.69

#	Query	Shape	Naive	NeuralMind	Ratio	Hit
1	`auth-flow`	cross-file	4,736	773	6.1×	33.3%
2	`api-endpoints`	focused	4,736	758	6.2×	100.0%
3	`billing-flow`	cross-file	4,736	774	6.1×	33.3%
4	`user-storage`	cross-file	4,736	651	7.3×	50.0%
5	`jwt-verify`	focused	4,736	669	7.1×	100.0%
6	`stripe-webhook`	focused	4,736	801	5.9×	100.0%
7	`create-user`	cross-file	4,736	771	6.1×	50.0%
8	`refund`	focused	4,736	760	6.2×	100.0%
9	`db-choice`	identity	4,736	854	5.5×	100.0%
10	`invoice-send`	cross-file	4,736	895	5.3×	50.0%

Phase 2 — Synapse recall A/B (same warm graph, recall off vs on)

Synapse edges after seeding co-editing sessions: 2834
Top-k hit rate: 71.7% off → 83.3% on (Δ +11.7 points)
Reduction ratio: 6.2× off → 6.2× on (Δ -0.07× — budget-neutral by design)

The Hebbian synapse layer is now the single learning measurement (the old
learned_patterns reranker was removed). The hit-rate delta shows associative recall
surfacing co-edited modules a purely textual search ranks lower; the near-zero reduction
delta confirms it does so without spending extra tokens (recalled nodes displace the
weakest hits, not add to them).

Assumptions

Baseline: every .py file in tests/fixtures/sample_project/ concatenated.
Tokenizer: tiktoken GPT-4o encoding (per-model breakdown in multi_model.json if generated).
Pricing: Claude 3.5 Sonnet input @ $3.0/MTok.
Regression floor: 4× — well below NeuralMind's typical 40–70× on real repos.

Per-model token reduction

Model	Tokenizer	Naive	NeuralMind	Ratio	Source
GPT-4o / GPT-4o-mini	`tiktoken o200k_base`	4,739	779	6.1×	measured
GPT-4 / GPT-3.5-turbo	`tiktoken cl100k_base`	4,710	770	6.1×	measured
Claude 3.5 Sonnet	`estimated: GPT-4o × 1.08 — install` anthropic `for an exact count`	5,118	841	6.1×	estimated
Llama 3 (70B)	`estimated: GPT-4o × 1.22 — Llama tokenizer requires model weights; estimate based on published vocab ratios`	5,781	950	6.1×	estimated

Rows marked measured use the provider's real tokenizer. Rows marked
estimated apply a published vocab-size correction to the GPT-4o count —
honest approximations, not hardcoded claims.

NeuralMind retrieval-quality eval

Suite	Queries	MRR	Answerability	Recall@5	Precision@5	Gate
`go`	10	0.950	100%	0.833	0.603	PASS
`python`	10	0.950	100%	0.833	0.678	PASS
`typescript`	10	0.900	100%	0.800	0.562	PASS

go vs baseline:

mrr: 0.950 (= +0.000)
answerability: 1.000 (= +0.000)
recall@1: 0.617 (= -0.000)
recall@3: 0.833 (= +0.000)
recall@5: 0.833 (= +0.000)

python vs baseline:

mrr: 0.950 (▲ +0.050)
answerability: 1.000 (= +0.000)
recall@1: 0.617 (▲ +0.100)
recall@3: 0.833 (= +0.000)
recall@5: 0.833 (= +0.000)

typescript vs baseline:

mrr: 0.900 (= +0.000)
answerability: 1.000 (= +0.000)
recall@1: 0.583 (= +0.000)
recall@3: 0.800 (= +0.000)
recall@5: 0.800 (= +0.000)

Overall: PASS

Automated by .github/workflows/ci-benchmark.yml — regenerate locally with python -m tests.benchmark.run and neuralmind benchmark --quality.

tests/test_integration_retrieval.py creates its own TemporaryDirectory hosting a ChromaDB store. Fixture teardown is LIFO, so the directory is removed before conftest's autouse chroma-release fixture runs; the embedder close in initialized_mind usually suffices, but Windows can hold the sqlite handle a beat longer (AV scans, deferred closes), which intermittently failed teardown with WinError 32. Mirror the conftest fixtures: ignore_cleanup_errors=True. https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547

Copilot

Pull request overview

Updates the documentation landing page to reflect v0.25.0 as the latest NeuralMind release (badge, JSON-LD version, and release timeline). The PR also includes an unrelated test fixture teardown change intended to avoid Windows failures with ChromaDB-backed temp directories.

Changes:

Bump landing-page “latest release” markers from v0.24.0 to v0.25.0 (badge, JSON-LD, timeline).
Adjust the v0.25.0 timeline pill to “Latest release” and demote v0.24.0 to plain history.
Update an integration-test temp directory fixture to use TemporaryDirectory(ignore_cleanup_errors=True) with expanded teardown rationale.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
docs/index.html	Advances the landing page’s “latest release” display and structured metadata to v0.25.0.
tests/test_integration_retrieval.py	Makes the integration test fixture teardown more resilient (especially on Windows) by ignoring temp-dir cleanup errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    ``ignore_cleanup_errors``: this directory hosts a ChromaDB store, and
+    fixture teardown is LIFO — the temp dir exits before conftest's
+    autouse chroma-release fixture runs, so on Windows a still-open (or
+    just-released, AV-scanned) sqlite handle would otherwise fail the
+    teardown even though ``initialized_mind`` closes the embedder.


+    """Create a minimal project with sample graph for testing.
+
+    ``ignore_cleanup_errors``: this directory hosts a ChromaDB store, and
+    fixture teardown is LIFO — the temp dir exits before conftest's
+    autouse chroma-release fixture runs, so on Windows a still-open (or
+    just-released, AV-scanned) sqlite handle would otherwise fail the
+    teardown even though ``initialized_mind`` closes the embedder.
+    """
+    with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:


            <div class="tl-item dev">
-                <div class="tl-head"><b>v0.25.0</b><span class="pill pill-dev">In development</span></div>
+                <div class="tl-head"><b>v0.25.0</b><span class="pill pill-latest">Latest release</span></div>


github-actions Bot added documentation Improvements or additions to documentation question Further information is requested labels Jun 12, 2026

dfrostar marked this pull request as ready for review June 12, 2026 06:08

Copilot AI review requested due to automatic review settings June 12, 2026 06:08

dfrostar merged commit edb5b05 into main Jun 12, 2026
19 checks passed

Copilot started reviewing on behalf of dfrostar June 12, 2026 06:09 View session

github-actions Bot mentioned this pull request Jun 12, 2026

chore(main): release 0.25.1 #232

Open

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: mark v0.25.0 as the latest release on the landing page#231

docs: mark v0.25.0 as the latest release on the landing page#231
dfrostar merged 2 commits into
mainfrom
claude/awesome-wozniak-uhq13c

dfrostar commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dfrostar commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend parity gate — graphify vs built-in tree-sitter

Gate checks

Multi-language structural parity

Optional SCIP precision pass

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NeuralMind self-benchmark

Phase 1 — Reduction on committed fixture

Phase 2 — Synapse recall A/B (same warm graph, recall off vs on)

Assumptions

Per-model token reduction

NeuralMind retrieval-quality eval

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading