Chronological record of development activity on the Chisel project.
2026-04-14 -- Review Thirteen fixes (kimi_review_2): risk_map crash, suggest_tests/diff_impact working_tree timeouts
Addressed all confirmed bugs from Phase 13 WorkingTreeCoverageFuzzer findings:
- Fix —
risk_mapcrash withworking_tree=true: Root cause wasKeyError: 'heuristic'inimpact.py:get_risk_map()(andcompute_risk_score()). Heuristic test edges created by_backfill_heuristic_edges()were not accounted for in theedge_type_countsdict. Added"heuristic": 0to both dicts so risk scoring no longer crashes when heuristic edges exist. - Fix —
suggest_teststimeout on working-tree files: AddedStaticImportIndexinstance caching toImpactAnalyzer(_get_static_index()) so repeated calls reuse the already-built index. Added a fast-path inengine.py:tool_suggest_tests(): whenworking_tree=Trueand a file has no DB test edges, skip the expensive static index build and fall back directly to stem-matching. - Fix —
diff_impacttimeout under working-tree load: Reused the cachedStaticImportIndexviaself.impact._get_static_index(). Changedtool_diff_impact()to only perform full static import scanning on tracked changed files; untracked files now rely on stem-matching fallback. This prevents timeouts when hundreds of untracked files are present.
chisel/impact.py— Added"heuristic"toedge_type_counts; added_get_static_index()cache toImpactAnalyzer; replaced inlineStaticImportIndex()instantiations with cached accessorchisel/engine.py—tool_suggest_tests()working-tree fast-path;tool_diff_impact()static-scan limited to tracked files + cached index usageCHANGELOG.md— Documented the three fixes under [Unreleased]findings/kimi_review_2/chisel.md→findings/kimi_review_2/chisel_closed.md— Renamed to indicate closure
2026-04-10 -- Review Twelve fixes: single-author coupling, diff_impact working_tree, coverage granularity, risk reweighting
Addressed all 4 confirmed gaps from RecipeLab Review Twelve (TestAffinityAnalyzer + RecipeComplianceEngine challenge):
- Fix — Co-change coupling 0.0 for single-author: Detected
distinct_authors == 1in_compute_churn_and_coupling()and halved the adaptive threshold (max(1, threshold // 2)). Solo developers' commit patterns now surface coupling signal instead of universal 0.0. Author count stored inmeta.distinct_authors. - Feature —
diff_impactworking_treeparameter: Addedworking_tree: booltotool_diff_impact()matchingsuggest_testsbehavior. When enabled, builds aStaticImportIndexto perform full static import scanning for untracked files, finding tests that import new files by path (not just stem-matching). Updated schema, dispatch table, and CLI. - Fix — Binary coverage_gap: Increased
_quantize_gapfrom 4 steps (0.25 increments) to 20 steps (0.05 increments) for finer granularity. Expandedproximity_adjustmentto apply to any file withcoverage_gap > 0.0(not just completely untested files), giving partial credit to files imported by tested code. - Fix — risk_map uniform reweighting threshold: Changed
apply_risk_reweighting()to trigger on 2+ uniform components OR any zero-valued uniform component (provably absent data). Previously required 3+ uniform, which meant single-author projects with 2 zero-uniform components (coupling=0.0, test_instability=0.0) never got reweighted, diluting risk scores by ~40%.
chisel/engine.py— Single-author detection + threshold halving,tool_diff_impactworking_treeparam withStaticImportIndex, new importchisel/impact.py—_quantize_gapsteps 4→20, proximity adjustment for partial coveragechisel/risk_meta.py— Reweighting threshold lowered, zero-value uniform special case, uniform tracking as dictchisel/schemas.py—diff_impactschema:working_treeparam + updated description, dispatch tablechisel/cli.py—--working-treeflag fordiff-impactsubcommandtests/test_cli.py— Updated diff_impact mock assertion for new paramtests/test_engine.py— Updated coverage_gap expected value (0.25→0.35)tests/test_impact.py— Updated coverage_gap expected value (0.75→0.65)README.md— Updateddiff_impacttool descriptionCLAUDE.md— Updated risk formula, co-change ingest, diff_impact, coverage gap, risk_meta docsCOMPLETE_PROJECT_DOCUMENTATION.md— Updated engine.py and impact.py descriptionsfindings/chisel.md→findings/chisel_closed.md— Renamed to indicate closure
Addressed findings from RecipeLab Review Ten (TestPyramidAnalyzer challenge):
- Bug fix —
suggest_testsoutput explosion:working_tree=Trueproduced 600K+ characters of output in large projects (40+ new files). Added_WORKING_TREE_SUGGEST_LIMIT = 30constant inengine.py— results are now capped at 30 entries whenworking_tree=True, preventing unusable output. Thelimitparameter still overrides via dispatch-layer post-processing. - Enhancement —
diff_impactstem-matching for untracked files: Untracked files were detected bydiff_impactbut produced no test hits when they had no DB edges. Added stem-matching fallback: untracked files with no direct/co-change/import-graph hits now match tests by filename stem similarity (source: "working_tree", score scaled by 0.4). Only matches with relevance >= 0.5 are included. - Documentation — coupling limitations: Updated
schemas.pycoupling tool description andCLAUDE.mdto explicitly state that co-change coupling requires multi-author commit history with many small commits, and solo projects should rely onimport_partners.
chisel/engine.py—_WORKING_TREE_SUGGEST_LIMITconstant, cap intool_suggest_tests, stem-match fallback intool_diff_impactchisel/schemas.py— Updatedcouplinganddiff_impacttool descriptionsCLAUDE.md— Updateddiff_impact,suggest_tests,coupling, and working-tree mode docs
chisel/bootstrap.py+CHISEL_BOOTSTRAPenv var so users loadregister_extractor()without forking CLI; tree-sitter remains user-installed.docs/CUSTOM_EXTRACTORS.md,examples/chisel_bootstrap_example.py,tests/test_bootstrap.py.- Documentation parity across README, CONTRIBUTING, CLAUDE, ARCHITECTURE, wiki
spec-project, AGENT_PLAYBOOK, ZERO_DEPS, COMPLETE_PROJECT_DOCUMENTATION.
start_job/job_status: stdlib-only backgroundanalyze/updateviathreading+bg_jobsSQLite table; CLIstart-job/job-status.- Impact /
suggest_tests:sourcefield (direct|co_change|import_graph|fallback|working_tree). risk_mapMCP:coverage_modewired in_TOOL_DISPATCH.- CI:
scripts/check_version.py,scripts/benchmark_chisel.py;docs/AGENT_PLAYBOOK.md,docs/ZERO_DEPS.md,examples/github-actions/chisel.yml. - Ruff: removed dead
cycles/langassignments.
Re-oriented README, CLAUDE.md, ARCHITECTURE.md, and COMPLETE_PROJECT_DOCUMENTATION.md toward LLM agents and solo developers running multiple agent sessions — emphasizing MCP-first usage, multi-process safety (locks, shared storage), and reframing git-derived “ownership” / “reviewer” tools as audit/heuristic signals rather than team workflows. Updated ecosystem and coupling copy to center import-graph and structured tool results.
Aligned wiki-local/spec-project.md (tool specs, 22 tools, import-graph impact, triage, locks, next_steps, git_error) and CONTRIBUTING.md (agent/solo preamble, architecture table, MCP/tool wiring guidance) to the same positioning.
Stress tested Chisel against Grafana's 21,464-file monorepo. Found and fixed two bugs that only appear at scale: numstat parsing crash and unit-level churn subprocess explosion.
- git_analyzer.py:
_parse_log_output— diff lines with tabs ingit log -Loutput were being split into 3 tab-separated fields and misidentified as numstat entries, causingValueError: invalid literal for int() with base 10: '+'. Fixed by validating that the first two fields are digits or-beforeint()conversion. - engine.py: Unit-level churn via
git log -Lspawns one subprocess per function. With 62k code units, the analysis ran for 24+ minutes before being killed. Added_UNIT_CHURN_FILE_LIMIT = 2000— repos above this threshold skip per-function churn while still computing file-level churn for all files.
- 14,334 code files scanned
- 62,379 code units extracted (Go + TypeScript + JavaScript)
- 3,870 test files discovered (Go test + Jest + Playwright)
- 22,155 test edges built
- Full analysis: ~3 minutes
risk_map(14k files): 0.8 seconds (batch queries)test_gaps(48k results): 0.2 seconds
- 553 tests, all passing
Three targeted fixes: multi-line /* */ block comment tracking across lines (correctness bug), minimum Python bumped to 3.11 with Z-suffix workaround removed, and test coverage gaps filled for _limit, tool_record_result, tool_stats, and MCP limit parameter.
_strip_strings_and_commentsnow accepts and returnsin_block_comment: boolstate_find_block_endpropagates block comment state across lines- Previously, braces inside multi-line
/* ... */comments were counted, potentially returning wrong block-end positions for C/C++/Java/Go/Rust/etc. - 6 new tests verify enter/exit/spanning behavior
- Minimum Python bumped from 3.9 to 3.11 (Python 3.9 EOL October 2025, 3.10 EOL October 2026)
- Removed dead
Z-suffix workaround in_parse_iso_date—fromisoformathandlesZnatively since 3.11 - CI matrix updated: 3.9-3.13 → 3.11-3.14
- Removed 3.9/3.10 classifiers
test_engine.py:test_tool_record_resultandtest_tool_statsat integration leveltest_cli.py:TestLimitParameterclass —_limit()helper, CLI truncation, non-list passthroughtest_mcp_server.py:test_call_with_limit— MCP server limit pass-through- 553 tests total, all passing
Four architectural improvements: pluggable AST extraction for tree-sitter/LSP integration, batch SQL to eliminate N+1 in risk_map, process-level shared locks for concurrent reads, cross-platform ProcessLock (Windows support via LockFileEx).
register_extractor(language, fn)stores custom extractors in_custom_extractorsdictextract_code_units()checks custom extractors first, falls back to built-in regexunregister_extractor(language)reverts to built-in (raises KeyError if not registered)get_registered_extractors()returns shallow copy for introspection- Zero new dependencies — registry is just callable hooks
- 5 new batch methods:
get_edges_for_code_batch,get_code_units_by_files_batch,get_co_changes_batch,get_churn_stats_batch,get_blame_batch _chunked()helper splits lists into chunks of 900 to stay under SQLite's 999-variable limitimpact.get_risk_map()rewritten to use batch queries — ~5 total queries instead of N*5compute_risk_score()unchanged for single-file use
- All 12 read tool methods now acquire
_process_lock.shared()(outer) +lock.read_lock()(inner) tool_record_resultnow acquires_process_lock.exclusive()+lock.write_lock()analyze()andupdate()already used exclusive locks — no change- Lock nesting order: process lock (outer) → RWLock (inner) — always consistent
- Module-level
_IS_WINDOWS = sys.platform == "win32"for platform detection - Unix:
fcntl.flock(unchanged behavior) - Windows:
ctypescalls tokernel32.LockFileEx/UnlockFileEx— supports both shared and exclusive locks _flock(fd, exclusive)and_funlock(fd)are platform-neutral module functionsProcessLock._acquire(exclusive: bool)replaces platform-specific lock type constants
- 18 new tests: extractor registry (6), batch queries (7), cross-platform lock (3), engine lock wiring (2)
- 540 tests total, all passing
Full codebase audit across all 12 source files using 7 parallel exploration agents. Fixed latent bugs, simplified code patterns, modernized Python syntax, consolidated dispatch logic, corrected stale documentation.
- engine.py:
_parse_and_store_code_units()read files withPath.read_text()without error handling — if a file vanished between scan and parse, the entire analysis crashed with an unhandledOSError. Now gracefully skips the file. - schemas.py: The
_LIMIT_PROPdict was shared by reference across all 11 tool schemas — any mutation would silently corrupt all schemas. Now each schema gets its own copy viadict(). - cli.py:
record-resultwithout--passedor--failedsilently defaulted to "passed", making the--passedflag useless. The mutually exclusive group is nowrequired=True. - glossary.md: Co-change coupling risk weight listed as 0.3 (old 4-component formula) instead of current 0.25.
- glossary.md: Tool dispatch table reference pointed to
mcp_server.pyinstead ofschemas.py(moved in v0.4.0).
- engine.py:
_detect_diff_base()for-loop + early return replaced withnext()generator expression - engine.py: Removed
pathlib.Pathimport — singlePath.read_text()replaced withopen()+ try/except - ast_utils.py: Removed
getattr(node, "end_lineno", None)guards (unnecessary since Python 3.9+ guaranteesend_lineno) - ast_utils.py: Removed redundant
lang is Nonecheck inextract_code_units()(Noneis never a key in_EXTRACTORS) - storage.py:
_normalize_unit_namesimplified fromif is not Nonetoor "" - impact.py: Loop-building-a-set in
get_risk_map()replaced with set comprehension
- git_analyzer.py: Walrus operator in
get_changed_files()eliminates double.strip()per line - test_mapper.py:
extract_test_dependencies()converted from instance method to@staticmethodwith_DEP_EXTRACTORSdispatch dict (replaces 11-branch if/if chain) - test_mapper.py: Uses
normalize_path()fromproject.pyinstead ofos.path.relpath()for cross-platform path consistency
- Updated all docs to reflect changes: COMPLETE_PROJECT_DOCUMENTATION.md, CLAUDE.md, CHANGELOG.md, glossary.md, spec-project.md
- Fixed stale co-change weight, tool dispatch location, dep graph, CLI subcommand count
- Added Python 3.14 classifier to pyproject.toml
- All 522 tests pass, no regressions
Addressed three structural gaps identified in codebase assessment: (1) fragile regex AST extraction for newer languages, (2) name-only test edge matching creating false positives, (3) missing PyPI publish automation. Added proximity-based edge weighting, Python import-path matching, improved regex patterns for 8 languages, and comprehensive test coverage for all of them.
- Proximity weighting:
_compute_proximity_weight()intest_mapper.pyscores test-to-code edges based on directory distance (1.0 same dir → 0.4 distant). Stored in the existingweightcolumn ontest_edges. - Python import-path matching:
_matches_import_path()resolvesfrom myapp.utils import footomyapp/utils.py:foospecifically, preventing false edges to unrelatedfoofunctions in other modules. Falls back to name-based matching for calls and non-Python languages. - Impact on existing behavior: All edge weights are ≤ 1.0 (same as before for same-directory matches).
impact.pyalready uses theweightfield, so impact analysis automatically benefits from higher-precision edges.
- Nested generics (C#, Java, C++):
(?:<[^>]*>)→(?:<(?:[^<>]|<[^>]*>)*>)— handlesDictionary<string, List<int>>,Map<String, List<Integer>> - Annotations/attributes (C#, Java, Swift): Added prefix patterns
^(?:\s*@\w+...)*and^(?:\s*\[[^\]]*\]...)*to handle@Override,@Entity,[Test],[Serializable],@objc - Kotlin extension functions:
fun\s+(?:[A-Za-z_]\w*\.)?(?P<name>...)—fun String.toSnake()now extractstoSnake(was extractingString) - C++ template functions + destructors: Added
template<...>prefix,~?in name capture for destructors - Dart factory/getters/setters: Regex accepts
factorykeyword andget/setkeyword before function names
- Added
.github/workflows/publish.yml— triggers on tag push (v*), builds withpython -m build, publishes via OIDC trusted publishing (pypa/gh-action-pypi-publish)
- spec-project.md: Complete rewrite — all 15 tools specified (was missing diff_impact, update, test_gaps, record_result, stats), all 12 languages in table with AST method details, all 17 CLI subcommands listed, new "Test Edge Weighting" section
- Updated CLAUDE.md with edge weighting and AST improvement notes
- 63 new tests for 8 newer languages (C#: 9, Java: 8, Kotlin: 8, C++: 8, Swift: 7, PHP: 6, Ruby: 8, Dart: 9)
- 9 new tests for proximity weighting and import-path matching
- 522 tests total, all passing
Full codebase audit across all 12 source files using 6 parallel exploration agents. Removed dead code, consolidated duplicated logic, modernized syntax, fixed documentation drift, and added missing error logging.
- project.py:
self._fd = NoneonProcessLock— never read or assigned after init
- project.py:
exclusive()andshared()were near-identical 10-line methods differing only in lock type; consolidated into shared_acquire(lock_type)helper - test_mapper.py:
parse_test_file()duplicated the exact logic from_check_rust_test()and_check_cpp_test()inline; extracted_check_rust_test_content()and_check_cpp_test_content()content-only helpers shared by both paths
- storage.py:
timeout=30onsqlite3.connect()was redundant —PRAGMA busy_timeout=30000(set on the next line) overrides it. Removed the dead parameter - storage.py: Restructured
get_direct_impacted_tests()condition — the oldlen(changed_functions) > 0followed by a separatechanged_functions is not Nonecheck was logically redundant - mcp_server.py: Negative
Content-Lengthvalues were not rejected (only zero was checked) - mcp_stdio.py:
call_tool()caught exceptions but never logged them server-side, making debugging impossible - metrics.py: Docstring claimed "no file path filtering needed" for unit-level churn, but the code does filter
- project.py:
str.removeprefix("./")replaces manualif startswith / slicepattern (Python 3.9+) - git_analyzer.py: Walrus operator (
:=) for regex match-then-check in_parse_blame_output()and_parse_diff_functions() - engine.py:
functions if functions else None→functions or None - test_mapper.py:
lang == "java" or lang == "kotlin"→lang in ("java", "kotlin")
- impact.py:
suggest_reviewers()parsed the same ISO dates 2-3 times per commit; now caches parsed datetimes per author
- Updated risk formula in
spec-project.md(was still showing old 4-component 0.4/0.3/0.2/0.1 weights instead of current 5-component 0.35/0.25/0.2/0.1/0.1) - Updated tool count from "10" to "15" in spec-project.md
- Updated README language/framework lists to include all 12 supported languages
- Updated version across
__init__.py,pyproject.toml,COMPLETE_PROJECT_DOCUMENTATION.md,CHANGELOG.md
- All 450 tests pass, no regressions
Full codebase audit across all 10 source files and 11 test files. Fixed bugs, removed dead code, eliminated redundancy, and improved encapsulation.
- engine.py:
_scan_code_files()was case-sensitive for extensions (.PYfiles skipped); now uses.lower() - engine.py:
_scan_code_files()called insidewrite_lock()inupdate(), blocking readers during filesystem walk; moved outside lock - git_analyzer.py:
compute_churn()commit_countincluded commits with unparseable dates that were skipped in analysis; now counts only analyzed commits - git_analyzer.py:
_parse_diff_functions()fell back to raw hunk context as function name (producing garbage likeclass Foo:); now skips non-function contexts - cli.py:
--passed/--failedflags onrecord-resultwere not mutually exclusive; both given silently ignored--failed; now usesadd_mutually_exclusive_group() - cli.py:
cmd_servedid not clean up engine on non-KeyboardInterrupt exceptions; now uses try/finally
- storage.py: Removed
get_latest_commit_date()— never called from any production code
- engine.py:
_detect_diff_base()no longer calls privateGitAnalyzer._run_git(); new publicget_current_branch()andbranch_exists()methods added toGitAnalyzer - cli.py: Added
--no-exclude-testsflag totest-gapssubcommand (was present in MCP schema but missing from CLI) - cli.py: Removed duplicate
sharedparent from top-level parser (subcommands already inherit it) - storage.py: Fixed
get_stats()docstring (blame_blocks→blame_cache) - storage.py: Added
ORDER BYtoget_all_test_units()for deterministic results - storage.py:
cleanup_orphaned_test_results()now passes tuple instead of list to_execute()for consistency - test_mapper.py: Eliminated triple file read for Rust test files (content now read once, reused for framework detection)
- mcp_stdio.py:
_run_server()now reusescreate_server()instead of duplicating engine creation logic - cli.py: Removed extra blank line between
_limit()and command handlers
- Updated 4 test assertions to match code changes (removed 2 dead-code tests, updated 2 CLI mock assertions)
- All 404 tests pass, ruff lint clean
Built the complete Chisel system from scratch: a zero-dependency test impact analysis and code intelligence tool for LLM agents.
- SQLite persistence layer (
storage.py): WAL-mode database with 9 tables (code_units,test_units,test_edges,commits,commit_files,blame_cache,co_changes,churn_stats,file_hashes). All CRUD operations with upsert semantics. Foreign key enforcement intentionally disabled to support stale test detection via orphaned edge references. - Multi-language AST extraction (
ast_utils.py): Python extraction using theastmodule with regex fallback for syntax errors. JavaScript/TypeScript, Go, and Rust extraction via regex patterns.CodeUnitdataclass for representing functions, classes, structs, enums, and impl blocks. Shared_SKIP_DIRSconstant for directory filtering. - Git analysis (
git_analyzer.py): Parsing ofgit log --numstatandgit blame --porcelainoutput via subprocess (no gitpython dependency). Churn score computation using the formulasum(1 / (1 + days_since_commit)). Ownership computation from blame blocks. Co-change coupling detection across file pairs (threshold: >= 3 co-commits). - Test mapper (
test_mapper.py): Automatic test file discovery with framework detection for pytest, Jest, Go test, Rust#[test], and Playwright. Dependency extraction (imports and function calls) per language. Test-to-code edge building by matching extracted dependencies against known code units. - Impact analysis (
impact.py): Finding impacted tests for changed files via direct test edges and transitive co-change coupling. Risk scoring with formula:0.4*churn + 0.3*coupling_breadth + 0.2*(1-test_coverage) + 0.1*author_concentration. Stale test detection (tests referencing removed code units). Reviewer suggestions based on commit activity. - Engine (
engine.py): Orchestrator class tying together Storage, GitAnalyzer, TestMapper, ImpactAnalyzer, and RWLock. Fullanalyze()pipeline: scan code files, extract code units, discover tests, parse git history, compute churn and co-changes, run blame, build test edges. Incrementalupdate()method using file content hashes. 10tool_*()methods, one per MCP tool. - CLI (
cli.py): argparse-based CLI with 12 subcommands (analyze,impact,suggest-tests,churn,ownership,coupling,risk-map,stale-tests,history,who-reviews,serve,serve-mcp). JSON output mode via--jsonflag. - HTTP MCP server (
mcp_server.py): ThreadedHTTPServer withGET /tools,GET /health,POST /callendpoints. JSON Schema definitions for all 10 tools. Tool dispatch table mapping tool names to engine methods. - stdio MCP server (
mcp_stdio.py): Async MCP-compliant server using the optionalmcpPython package. Communicates over stdin/stdout for Claude Desktop and Cursor integration. - Read-write lock (
rwlock.py): Multiple concurrent readers or one exclusive writer, used by the engine for thread-safe storage access. - Test suite: 305 tests covering all modules.
- Zero external dependencies (stdlib only).
- Git as the sole source of truth (subprocess, not gitpython).
- Incremental analysis via file content hashing.
- Blame caching keyed by file content hash.
Added MIT license and extended churn analysis to the function level.
- MIT license: Added
LICENSEfile to the project root. - Function-level git log (
git_analyzer.py): Newget_function_log()method usinggit log -L :funcname:fileto retrieve commits that touched a specific function. - Unit-level churn (
engine.py):analyze()now computes churn stats per function (not just per file). For each code unit of typefunctionorasync_function, the engine callsget_function_log()and stores the resulting churn stats with the unit name. Thecompute_churn()method was updated to accept aunit_nameparameter: when provided, all commits are assumed pre-filtered bygit log -Land used directly without file-path filtering.
Major cleanup pass: refactored storage to use a single persistent connection, differentiated ownership from reviewer suggestions, removed dead code, and fixed multiple bugs.
- Storage refactor (
storage.py): Replaced per-method connection creation with a single persistent SQLite connection (check_same_thread=False). WAL mode and PRAGMA settings applied once at init. Addedclose()method for proper lifecycle management._connect()now returns the persistent connection rather than creating a new one. - Ownership vs. reviewers differentiation (
impact.py):get_ownership()returns blame-based authorship withrole: "original_author"-- shows who wrote the code.suggest_reviewers()returns commit-activity-based suggestions withrole: "suggested_reviewer"-- shows who has been actively maintaining the file and is best positioned to review.- MCP tool descriptions in
mcp_server.pyupdated to clarify the distinction.
- Shared constants (
ast_utils.py): Moved_SKIP_DIRStoast_utils.pyas the canonical location. Bothengine.pyandtest_mapper.pynow import it from there instead of defining their own copies. - Scoped analysis (
engine.py):tool_analyze()/analyze()now accepts adirectoryparameter to scope code scanning to a subdirectory while keeping git log and test discovery project-wide. - Helper extraction (
impact.py): New_aggregate_blame_lines()helper to deduplicate blame aggregation logic used by bothget_ownership()and_author_concentration(). - Import consolidation (
mcp_stdio.py):_TOOL_DISPATCHand_TOOL_SCHEMASnow imported frommcp_server.pyinstead of being duplicated. - Module-level compilation: Blame header regex in
git_analyzer.pycompiled once at module level.defaultdictimports inimpact.pymoved to module level.
- Redundant
compute_file_hashcall per code unit during analysis (was called once per unit instead of once per file). - First-write-wins logic in
get_impacted_tests()was dropping higher-score test edges; changed to keep the highest score. _strip_strings_and_comments()incorrectly treated#as a comment for JS/TS/Go/Rust (only valid for Python, which uses_py_block_endinstead).cli.main()discarded handler return values.- Go import parsing failed on aliased imports.
- Unreachable loop in
engine.py(lines 98-102). - Unused
_print_tablefunction incli.py. - Unused imports across test files.
- Dead
frameworkparameter inextract_test_dependencies.
- 3 new tests added (313 total).
Third comprehensive code review using 10 parallel agents to audit every module, cross-validate inter-module contracts, and identify semantic bugs. Fixed 7 bugs (including 2 logic errors that silently produced wrong results), consolidated duplicate code, and hardened error handling.
impact.py:changed_functions or Noneconverted an empty list[]toNone, causingget_impacted_tests()to return ALL tests when the caller explicitly said "no functions changed" (should return none). Root cause: Python's[] or Noneevaluates toNonebecause empty list is falsy.impact.py:get_risk_map(directory="src")used barestartswith("src")which incorrectly matched paths likesrc_backup/file.py. Changed tostartswith("src/")with proper path boundary.cli.py:cmd_stale_testsdisplayed a nonexistent"reason"field (always blank). The actual field fromdetect_stale_tests()is"edge_type". This was masked by the old defensive.get("reason", "")fallback.mcp_server.py:ChiselMCPServer.stop()closed the engine but didn't setself._engine = None, leaving a stale reference. Inconsistent with_httpdand_threadcleanup.mcp_stdio.py:create_server()created aChiselEnginecaptured in a closure with no cleanup path. Engine now stored asserver._enginefor caller cleanup.git_analyzer.py:compute_churn()called_parse_iso_date()without try-except, so a malformed commit date would crash the entire churn computation.compute_co_changes()already had the guard — now both are consistent.tests/test_cli.py: 6 test mocks had incorrect field names (scorevsrelevance,reasonvsedge_type, missingpercentage/recent_commits/date/author/message). These never failed because the old CLI code used defensive fallback chains that silently returned defaults.
ast_utils.py: Replaced 3 near-identical functions (_extract_js_ts,_extract_go,_extract_rust) with a shared_extract_brace_lang(file_path, content, patterns)helper. Each language now defines a pattern table (_JS_TS_PATTERNS,_GO_PATTERNS,_RS_PATTERNS) — a list of(regex, unit_type)tuples whereunit_typecan be a string or acallable(match) -> (name, type)for dynamic extraction (Go'skindgroup, Rust'simplname stripping). Net reduction: ~50 lines.storage.py: Deduplicated the identical 6-line SELECT/JOIN clause inget_direct_impacted_tests()into a localbase_sqlvariable shared by both query paths.
cli.py: Removed_print_result()function (only used once bycmd_churn, dict branch was unreachable). Inlined the list iteration.cli.py: Stripped all.get("x", .get("y", ...))defensive fallback chains across 10 command handlers. The engine returns well-defined dicts — the fallbacks masked field name mismatches (proven by the test mock fixes above).
- Updated 6 CLI test mocks to use correct field names matching actual engine output contracts.
- Updated
test_risk_map_with_directoryto use directory-style paths (src/app.py) instead of exploiting the old buggy prefix behavior. - Fixed misleading comment on
_py_block_endreturn value. - 334 tests (count unchanged).
Full codebase audit using parallel agents to review every module, cross-validate all inter-module dependencies, and verify test coverage. Fixed 5 bugs, removed 5 instances of dead code, and made 4 performance/quality improvements.
git_analyzer.py:_parse_diff_functions()returned full declaration lines (e.g.def foo():) instead of bare function names (foo). This caused thechanged_functionsfilter inimpact.py:get_impacted_tests()to silently match nothing, making function-level impact filtering a no-op.cli.py:cmd_suggest_testsreaditem.get("score")butImpactAnalyzer.suggest_tests()returns key"relevance". Score always displayed as empty string in human output.engine.py:tool_churnfell back to returning all file churn stats even when a specificunit_namewas requested and not found. Now returns[]for missing units.cli.py: All 10cmd_*handlers createdChiselEngineinstances without closing them, leaking SQLite connections. Changed towith ChiselEngine(...) as engine:.mcp_server.py:ChiselMCPServer.stop()didn't callengine.close(), leaking the SQLite connection.
Storage.delete_test_units_by_file()andStorage.delete_edges_for_test()-- defined but never called from any module.- Unreachable
framework == "rust"branch inTestMapper.detect_framework()-- no pattern in_FRAMEWORK_PATTERNSproduces"rust". - Unreachable
handler is Noneguard incli.main()-- argparse validates subcommands. - Unused
_project_dirand_storage_dirfields onChiselMCPServer.
engine.update()calledparse_log()twice (once partial, once full) -- now calls it once.test_mapper.build_test_edges()re-read the same file for every test unit from that file -- added file content cache.impact.pymoved lazyGitAnalyzerimport to module top-level (no circular import risk).ast_utils.extract_code_units()addedNoneguard for extractor lookup defensively.
- Engine fixture in
test_engine.pychanged fromreturntoyield+close()to avoid connection leaks. - Simplified overly complex
test_cmd_serve_human(removed dead_origimport, unnecessarysys.modulesmanipulation). - Added
_make_engine_mock()helper for CLI handler tests to support context manager protocol. - Updated assertions in
test_git_analyzer.pyand removed tests for deletedStoragemethods.
- 334 tests (removed 2 tests for deleted methods, adjusted 3 assertions).