fix: make the test suite Windows-green and restore full Windows support#228
Conversation
Closes the four Windows failure classes from the cross-platform CI run (509 passed / 5 failed / 134 errors on windows-latest): - chromadb teardown (~134 errors): GraphEmbedder.close() now stops the client's cached System and evicts it from chroma's per-path cache, releasing the sqlite/HNSW file handles Windows needs closed before a directory can be deleted. The previous close() deleted the collection, which released nothing and destroyed data. NeuralMind.close() added on top; conftest releases all cached Systems after every test and the temp-project fixtures ignore residual cleanup errors. - event-log rotation (3 failures): the tailer's read handle is opened with FILE_SHARE_DELETE on Windows (CreateFileW), so a logrotate-style rename of the live log no longer throws PermissionError under a reader. POSIX path unchanged. - concurrent appends (1 failure): recent-queries appends are a single O_APPEND write serialized by a process-local lock, plus a best-effort cross-process byte-range lock on Windows shared with compaction. POSIX behavior is unchanged (O_APPEND was already atomic). - executable-bit test (1 failure): skipped on Windows, which has no POSIX execute bit. windows-latest (Python 3.12) rejoins the gating matrix, COMPATIBILITY.md restores the Windows row to Full, and the landing page's schema.org operatingSystem claims Windows again (and marks v0.24.0 as the latest release now that it has shipped). Fixes #186 https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547
Backend parity gate — graphify vs built-in tree-sitter✅ PASS — the built-in backend must stay within tolerance of graphify on the reference fixture.
Gate checks
Tolerances: reduction within 25% (floor 4.0×), faithfulness within 0.10 (floor +0.00). Override via Automated by Multi-language structural parity
Coverage floor: 90% of graphify's per-language symbols (no gold-fact set exists for TS/Go, so parity is structural). Optional SCIP precision pass
Off by default ( |
NeuralMind self-benchmarkStatus: Phase 1 — Reduction on committed fixture
Phase 2 — Learning uplift
Note: uplift numbers on a 500-line fixture are intentionally modest — the point is to Phase 3 — Synapse recall A/B (same warm graph, recall off vs on)
This isolates the Hebbian synapse layer from the Assumptions
Per-model token reduction
Rows marked measured use the provider's real tokenizer. Rows marked NeuralMind retrieval-quality eval
Overall: PASS Automated by |
os.kill(pid, 0) is the POSIX "does this process exist" idiom, but on Windows signal.CTRL_C_EVENT == 0, so the call delivers a real Ctrl-C to the probed pid's console process group. In the test suite the discovery file records pytest's own pid, so the probe interrupted the whole run with a KeyboardInterrupt; for users it could interrupt any console the daemon shares. Probe via OpenProcess/GetExitCodeProcess instead on Windows; POSIX path unchanged. https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547
Closes the four Windows failure classes from the cross-platform CI run (509 passed / 5 failed / 134 errors on windows-latest) and re-adds
windows-latestto the gating matrix.The four fixes
1. ChromaDB temp-dir teardown (~134 errors)
Chroma caches one
Systemper storage path for the life of the process, holding the sqlite connection pool and HNSW segment files.GraphEmbedder.close()previously calleddelete_collection— which released nothing (and destroyed data a later open expected to find). It now stops the client'sSystemand evicts it fromSharedSystemClient._identifier_to_system, so Windows can actually delete the store afterwards. Verified against chromadb 1.5.9.On top of that:
NeuralMind.close()added (delegates to the backend, safe to call twice).tests/conftest.pygains an autouse fixture that stops every cached chromaSystemafter each test — handles are released regardless of whether the test cleaned up — and thetemp_project/empty_projectfixtures useignore_cleanup_errors=Trueas a belt-and-suspenders.2. Event-log rotation (3 failures)
The tailer holds its read handle across poll intervals, and rotation is a logrotate-style rename. POSIX allows renaming an open file; Windows'
open()omitsFILE_SHARE_DELETE, so the rotating process gotPermissionError. The tailer now opens viaCreateFileWwith share-delete on Windows, recreating POSIX semantics — rotation never depends on catching the tailer between polls. POSIX path is the sameopen(path, "rb")as before.3. Concurrent recent-queries appends (1 failure)
The append relied on POSIX
O_APPENDatomicity; Windows' CRT implements append as a separate seek-to-end + write, so 8 threads × 5 appends landed 37/40 lines. Appends are now a singleos.writeon anO_APPENDfd serialized by a process-local lock, plus a best-effort cross-process byte-range lock (msvcrt.locking, non-blocking with ~50ms retry) shared with_compact_recent_queries— so a compaction's read-truncate-rewrite can't drop a concurrent process's append either. POSIX behavior unchanged.4. Executable-bit test (1 failure)
test_cmd_init_hook_makes_executableis skipped on Windows, which has no POSIX execute bit.Support claims restored (issue checklist)
windows-latestre-added to thetestmatrix inci.yml(Python 3.12) — the run on this PR is the proofdocs/COMPATIBILITY.mdWindows row restored to ✅ Fulldocs/index.htmloperatingSystemrestored to "Linux, macOS, Windows"Also folds in the post-release landing-page touch-up (same file as the
operatingSystemedit): v0.24.0 is marked as the latest release in the hero badge, timeline, and JSON-LDsoftwareVersion, mirroring what #224 did for v0.23.0.Verification
black --checkandruff checkclean; mypy introduces no new errors in the touched modules.windows-latestCI leg (the whole point of re-gating it).Fixes #186
https://claude.ai/code/session_01FkHXHcjpWZL2EWn4HGi547
Generated by Claude Code