Skip to content

docs: mark v0.25.0 as the latest release on the landing page#231

Merged
dfrostar merged 2 commits into
mainfrom
claude/awesome-wozniak-uhq13c
Jun 12, 2026
Merged

docs: mark v0.25.0 as the latest release on the landing page#231
dfrostar merged 2 commits into
mainfrom
claude/awesome-wozniak-uhq13c

Conversation

@dfrostar

Copy link
Copy Markdown
Owner

v0.25.0 shipped, so the landing page's "latest" markers move forward, mirroring what #224 did for v0.23.0:

  • Hero badge: v0.24.0 → v0.25.0 (and its release-notes link)
  • JSON-LD softwareVersion: 0.24.0 → 0.25.0
  • Timeline: v0.25.0 takes the "Latest release" pill (was "In development"); v0.24.0 joins the plain history trail

No other content changes.

https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547


Generated by Claude Code

v0.25.0 shipped, so the hero badge, JSON-LD softwareVersion, and the
timeline pills move forward: v0.25.0 takes the "Latest release" pill
and v0.24.0 joins the plain history trail.

https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547
@github-actions github-actions Bot added documentation Improvements or additions to documentation question Further information is requested labels Jun 12, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Backend parity gate — graphify vs built-in tree-sitter

✅ PASS — the built-in backend must stay within tolerance of graphify on the reference fixture.

Metric graphify built-in
code nodes 65 79
mean reduction 6.05× 6.66×
faithfulness delta -0.047 +0.143
fact recall 0.527 0.717
grounding 0.889 1.000

Gate checks

  • reduction within tolerance of graphify — built-in 6.66× ≥ 4.54× (graphify 6.05× − 25%)
  • reduction ≥ absolute floor — built-in 6.66× ≥ floor 4.00×
  • faithfulness delta within tolerance of graphify — built-in +0.143 ≥ -0.147 (graphify -0.047 − 0.10)
  • faithfulness delta ≥ absolute floor — built-in +0.143 ≥ floor +0.000
  • fact recall within tolerance of graphify — built-in 0.717 ≥ 0.427 (graphify 0.527 − 0.10)

Tolerances: reduction within 25% (floor 4.0×), faithfulness within 0.10 (floor +0.00). Override via NEURALMIND_PARITY_* env vars.

Automated by evals/parity/run.py — reproduce locally with python -m evals.parity.run.

Multi-language structural parity

Language graphify symbols built-in covers dangling
typescript 54 54 (100%) 0
go 45 45 (100%) 0
  • typescript: symbol coverage ≥ floor — 54/54 graphify symbols (100%) ≥ 90%
  • typescript: no dangling edges — 0 dangling edge(s)
  • go: symbol coverage ≥ floor — 45/45 graphify symbols (100%) ≥ 90%
  • go: no dangling edges — 0 dangling edge(s)

Coverage floor: 90% of graphify's per-language symbols (no gold-fact set exists for TS/Go, so parity is structural).

Optional SCIP precision pass

  • precision: SCIP corrects the heuristic call edge — run() → A.handle under SCIP (heuristic wrongly linked B.handle)
  • precision: strict no-op when disabled — graph unchanged when NEURALMIND_PRECISION is unset

Off by default (NEURALMIND_PRECISION); proven on tests/fixtures/scip_precision to replace a heuristic call edge with the compiler-accurate one a SCIP index resolves.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

NeuralMind self-benchmark

Status: PASS — floor , measured 6.2×.

Phase 1 — Reduction on committed fixture

  • Average reduction: 6.2×
  • Top-k retrieval hit rate: 71.7%
  • Naive baseline: 47,360 tokens (all fixture files concatenated)
  • NeuralMind total: 7,706 tokens across 10 queries
  • Estimated monthly savings @ 100 queries/day on Claude 3.5 Sonnet: ~$35.69
# Query Shape Naive NeuralMind Ratio Hit
1 auth-flow cross-file 4,736 773 6.1× 33.3%
2 api-endpoints focused 4,736 758 6.2× 100.0%
3 billing-flow cross-file 4,736 774 6.1× 33.3%
4 user-storage cross-file 4,736 651 7.3× 50.0%
5 jwt-verify focused 4,736 669 7.1× 100.0%
6 stripe-webhook focused 4,736 801 5.9× 100.0%
7 create-user cross-file 4,736 771 6.1× 50.0%
8 refund focused 4,736 760 6.2× 100.0%
9 db-choice identity 4,736 854 5.5× 100.0%
10 invoice-send cross-file 4,736 895 5.3× 50.0%

Phase 2 — Synapse recall A/B (same warm graph, recall off vs on)

  • Synapse edges after seeding co-editing sessions: 2834
  • Top-k hit rate: 71.7% off → 83.3% on (Δ +11.7 points)
  • Reduction ratio: 6.2× off → 6.2× on (Δ -0.07× — budget-neutral by design)

The Hebbian synapse layer is now the single learning measurement (the old
learned_patterns reranker was removed). The hit-rate delta shows associative recall
surfacing co-edited modules a purely textual search ranks lower; the near-zero reduction
delta confirms it does so without spending extra tokens (recalled nodes displace the
weakest hits, not add to them).

Assumptions

  • Baseline: every .py file in tests/fixtures/sample_project/ concatenated.
  • Tokenizer: tiktoken GPT-4o encoding (per-model breakdown in multi_model.json if generated).
  • Pricing: Claude 3.5 Sonnet input @ $3.0/MTok.
  • Regression floor: — well below NeuralMind's typical 40–70× on real repos.

Per-model token reduction

Model Tokenizer Naive NeuralMind Ratio Source
GPT-4o / GPT-4o-mini tiktoken o200k_base 4,739 779 6.1× measured
GPT-4 / GPT-3.5-turbo tiktoken cl100k_base 4,710 770 6.1× measured
Claude 3.5 Sonnet estimated: GPT-4o × 1.08 — install anthropic for an exact count 5,118 841 6.1× estimated
Llama 3 (70B) estimated: GPT-4o × 1.22 — Llama tokenizer requires model weights; estimate based on published vocab ratios 5,781 950 6.1× estimated

Rows marked measured use the provider's real tokenizer. Rows marked
estimated apply a published vocab-size correction to the GPT-4o count —
honest approximations, not hardcoded claims.

NeuralMind retrieval-quality eval

Suite Queries MRR Answerability Recall@5 Precision@5 Gate
go 10 0.950 100% 0.833 0.603 PASS
python 10 0.950 100% 0.833 0.678 PASS
typescript 10 0.900 100% 0.800 0.562 PASS

go vs baseline:

  • mrr: 0.950 (= +0.000)
  • answerability: 1.000 (= +0.000)
  • recall@1: 0.617 (= -0.000)
  • recall@3: 0.833 (= +0.000)
  • recall@5: 0.833 (= +0.000)

python vs baseline:

  • mrr: 0.950 (▲ +0.050)
  • answerability: 1.000 (= +0.000)
  • recall@1: 0.617 (▲ +0.100)
  • recall@3: 0.833 (= +0.000)
  • recall@5: 0.833 (= +0.000)

typescript vs baseline:

  • mrr: 0.900 (= +0.000)
  • answerability: 1.000 (= +0.000)
  • recall@1: 0.583 (= +0.000)
  • recall@3: 0.800 (= +0.000)
  • recall@5: 0.800 (= +0.000)

Overall: PASS


Automated by .github/workflows/ci-benchmark.yml — regenerate locally with python -m tests.benchmark.run and neuralmind benchmark --quality.

tests/test_integration_retrieval.py creates its own TemporaryDirectory
hosting a ChromaDB store. Fixture teardown is LIFO, so the directory is
removed before conftest's autouse chroma-release fixture runs; the
embedder close in initialized_mind usually suffices, but Windows can
hold the sqlite handle a beat longer (AV scans, deferred closes), which
intermittently failed teardown with WinError 32. Mirror the conftest
fixtures: ignore_cleanup_errors=True.

https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547
@dfrostar dfrostar marked this pull request as ready for review June 12, 2026 06:08
Copilot AI review requested due to automatic review settings June 12, 2026 06:08
@dfrostar dfrostar merged commit edb5b05 into main Jun 12, 2026
19 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the documentation landing page to reflect v0.25.0 as the latest NeuralMind release (badge, JSON-LD version, and release timeline). The PR also includes an unrelated test fixture teardown change intended to avoid Windows failures with ChromaDB-backed temp directories.

Changes:

  • Bump landing-page “latest release” markers from v0.24.0 to v0.25.0 (badge, JSON-LD, timeline).
  • Adjust the v0.25.0 timeline pill to “Latest release” and demote v0.24.0 to plain history.
  • Update an integration-test temp directory fixture to use TemporaryDirectory(ignore_cleanup_errors=True) with expanded teardown rationale.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
docs/index.html Advances the landing page’s “latest release” display and structured metadata to v0.25.0.
tests/test_integration_retrieval.py Makes the integration test fixture teardown more resilient (especially on Windows) by ignoring temp-dir cleanup errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +26
``ignore_cleanup_errors``: this directory hosts a ChromaDB store, and
fixture teardown is LIFO — the temp dir exits before conftest's
autouse chroma-release fixture runs, so on Windows a still-open (or
just-released, AV-scanned) sqlite handle would otherwise fail the
teardown even though ``initialized_mind`` closes the embedder.
Comment on lines +20 to +28
"""Create a minimal project with sample graph for testing.

``ignore_cleanup_errors``: this directory hosts a ChromaDB store, and
fixture teardown is LIFO — the temp dir exits before conftest's
autouse chroma-release fixture runs, so on Windows a still-open (or
just-released, AV-scanned) sqlite handle would otherwise fail the
teardown even though ``initialized_mind`` closes the embedder.
"""
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
Comment thread docs/index.html
Comment on lines 586 to +587
<div class="tl-item dev">
<div class="tl-head"><b>v0.25.0</b><span class="pill pill-dev">In development</span></div>
<div class="tl-head"><b>v0.25.0</b><span class="pill pill-latest">Latest release</span></div>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants