release: v0.5.0#79
Merged
Merged
Conversation
Two cosmetic bugs in cgh status: - A fresh repo with no DB on disk said '(would create graph.db)', but DuckDB has been the default backend since v0.4, so a first index creates graph.duckdb. Corrected the text. - The Endpoints row rendered as a bare ', ' when counts came from the FTS-only or unknown fallback (no graph read). Now shows 'unknown (graph locked)' or 'unknown' to match the Files cell.
fix: cgh status shows graph.duckdb default + clean empty Endpoints
cgh treated the working directory literally and only looked for .codegraph/ right there, so running cgh status / index / serve from a subdir of an initialized repo reported no index even though the root one directory up had it. find_codegraph_root walks up to the nearest ancestor with a .codegraph/, the way git finds its repo root via .git. main() resolves --root through it for every command except init/setup (which create in the literal directory) and the internal _serve_owner / _reindex_hook (which get an explicit root). The 'Using codegraph root: ...' hint prints to stderr so stdout and --json output stay clean. Cross-platform: uses pathlib resolve()/.parents, which stop at the drive root on Windows and the filesystem root on POSIX. Nearest .codegraph/ wins, so a federated child's subdir resolves to the child, not the parent. Tests cover current-dir, ancestor, absent, and nearest-wins.
feat: find the codegraph root from a subdirectory (git-style)
- Auth token compared with hmac.compare_digest instead of != (timing-safe); it is the loopback bridge's only auth check. - Removed the dead auth env-injection path (inject_auth_key_into_mcp_json, validate_server_auth_key were never called) and corrected the auth.py lifecycle docstring: the 0600 file contents are the shared secret, there is no env hand-off. The .codegraph/ dir is now chmod 0700 at creation so auth.key's parent is owner-only too. - index_changed_files rejects a 'since' ref starting with '-' and appends '--', so a value like '--output=PATH' can't be parsed as a git flag (argument injection via the MCP arg). - pattern_search passes the user pattern after '--' (ripgrep) and via '-e' (git-grep), so a pattern like '--pre=sh' can't reach ripgrep's preprocessor (code exec). - force_index refuses absolute paths resolving outside the repo root (new _within_repo guard) and surfaces them as refused_outside_repo, instead of indexing arbitrary files the project never declared. - Pinned the mermaid CDN script to 11.4.1 with an SRI integrity hash and crossorigin, so a compromised CDN can't run JS in the generated report. Tests: _within_repo containment cases; full suite green (391).
fix(security): close audit findings in the MCP owner and tools
- purge_file_data now also deletes the inbound side of self-referential edges (CALLS/INHERITS), matching Kuzu's DETACH DELETE. On DuckDB, find_callers had been returning ghost callers after a function changed. - _resolve_calls prefers a same-file definition (a local run() no longer links to every run() in the repo) and memoizes name->ids per file, cutting the per-call-site query fan-out. - index_file enforces max_file_size_kb and ignore_patterns, which were defined and documented but never applied; index_repo loads the config once per scan and threads it in (no per-file config read). - _fts and .cghignore caches are keyed by repo root instead of a global singleton, so a multi-repo process no longer crosses streams. - IMPORTS edges from a barrel re-export collapse to one whole-module edge past 50 named symbols instead of flooding the graph. - Markdown links resolve relative to the file's own directory, so ./foo.md actually matches; collapsed the dead terraform kind branch; git-diff discovery timeout raised 10s->30s; find prunes ignore dirs at the walk level; reset_connection and rows() surface failures instead of swallowing; failed scan deletions are logged. Tests: parity (no ghost caller after re-index), same-file CALLS scoping, size cap + ignore-pattern skip, force bypass. Full suite green (396).
- Hoisted the parent+children fan-out into two helpers in federation.py (federate_scoped / federate_flat). tools_query, tools_arch, and tools_docs each had a near-identical copy under two different names; they now call the shared helper. CLAUDE.md already told contributors to reuse such a helper, but none existed. - Switched the server modules off the deprecated for_each_child_kuzu alias to the canonical for_each_child_graphdb (DuckDB is the default backend, so the _kuzu name misled). Aliases stay one more release. Fixed the stale 'Same as for_each_kuzu' docstring. - __main__.py: factored the --root flag, repeated ~30 times, into a single _add_root(p) helper; typed every CLI handler as args: argparse.Namespace; replaced the two literal '---' separators in the doctor help with a colon. No behavior change. Full suite green (396), ruff + no-ai-tells clean.
refactor: shared federation fan-out, drop _kuzu names, CLI cleanup
cmd_init (438 lines) split into named phase helpers: _print_prior_state, _detect_ai_tools, _select_tools, _count_parseable_files, _print_file_counts, _print_init_summary. Buried mid-function imports moved into the helpers that use them; handler typed args: argparse.Namespace. cmd_init now reads as a sequence of steps. Pure readability, no behavior change.
cmd_status's owner/RO/FTS fallback ladder extracted into _status_via_owner, _status_via_ro_open, _status_via_fts (+ _empty_status_source for the shared dict shape). cmd_status selects the first tier whose counts_source != 'none' (same order/conditions) and renders once. Rendered output unchanged. Adds tests/test_cli/test_status_fallback.py covering the dict shape and the RO and FTS tiers without spawning an owner. Pure readability + tests.
refactor: decompose cmd_init into phase helpers
refactor: extract cmd_status 3-tier fallback + add tests
Add four read-only, federated graph-insight tools in codegraph/server/tools_insight.py: - file_summary: one-shot file orientation (role/layer/lang/module_doc, defined functions and classes, resolved imports, and importers). - impact_of: reverse blast radius via bounded reverse-BFS over CALLS (callers) or IMPORTS (importers), grouped by role/layer with reaching endpoints. Carries the name-matched CALLS over-count caveat. - path_between: shortest path over CALLS or IMPORTS via forward BFS, reported per scope so it never crosses repo boundaries. - import_cycles: SCC detection (iterative Tarjan) over the IMPORTS graph. Register the module in server/__init__.py. Add role/layer filters to search_symbols and symbol_lookup in tools_query.py (backward compatible, empty = no filter). Tools are tested via a fake mcp whose .tool() decorator captures the closures unchanged, then calling them against a tiny indexed tmp repo.
feat: graph-insight MCP tools (impact_of, file_summary, path_between, import_cycles) + role/layer filters
Endpoint extraction now covers Django urlpatterns, NestJS decorators, Spring @*Mapping annotations, and Gin/Echo router calls alongside the existing FastAPI / Express / Nuxt support. New config-as-data parsers (JSON / TOML / YAML) replace the previous bare-File stubs: each top-level key plus one nested level becomes a section in the existing MdSection model, with k8s kind/name and GitHub Actions jobs surfaced. A SQL DDL parser turns CREATE TABLE and ALTER TABLE ... ADD COLUMN into table:<name> sections that list their columns. No new graph node types or schema changes. Parsers degrade cleanly on malformed input. Tests cover every new endpoint framework, config parsing, SQL DDL, and a round-trip index_file landing in the graph.
Add three git-history analysis features. - analysis/churn.py: pure git-log functions. file_churn aggregates per-file commit counts, recency, authors, and line deltas from the bounded numstat log; file_ownership rolls up the top authors of one file. Results are cached per (repo_root, HEAD) so a long-lived owner does not re-shell out. Every git call has a timeout and degrades to an empty result when git is absent or the command fails. - server/tools_history.py: hotspots joins churn with import in-degree (centrality over the IMPORTS edge) and ranks change-risk files; the score weights commit count 0.45, importers 0.35, recency 0.20, with log1p compression. who_knows returns a file's top authors. Both read _srv._root at call time and return JSON. Registered in server/__init__. - viz layer-dependency diagram (FEAT-12): mermaid_layers builds a layer-to-layer IMPORTS graph from File.layer, ordered by roles.LAYER_ORDER, deduped with edge counts. Wired as a new "layers" kind into both the cgh graph CLI verb and the visualize_graph MCP tool (mermaid + dot), backend-neutral via the GraphDB protocol. Tests: tests/test_analysis/test_churn.py builds a real git repo with pinned identities; tests/test_server cover hotspots, who_knows, and the layers diagram. 59 passed.
feat: git-history tools (hotspots, who_knows) + layer-dependency diagram
feat: config/SQL parsers + more endpoint frameworks
Add two MCP tools and a CI-oriented CLI command, all computed on the fly
from existing IMPORTS / CALLS edges plus File.role. No new edge type, no
schema change.
FEAT-3 (server/tools_tests.py):
- tests_for(symbol_or_file): test files that import the target (or, for
a symbol, whose functions call it). Returns the inferred mapping with a
heuristic note, federated across parent + subrepos.
- untested(role, layer): non-test source files no test imports, filtered
by role/layer, capped at 200 with a truncation note.
FEAT-10 (cli/commands_impact.py): `cgh impact --since <ref>` for PR bots.
Diffs the working tree against a git ref, reads the graph read-only (no
MCP owner needed), and emits the changed symbols, the IMPORTS blast
radius grouped by role/layer, endpoints touched, and tests to run. JSON
on clean stdout (--json / --format json) or a markdown PR-comment
summary (--format md). Degrades gracefully when the index is missing or
the graph is locked. Banner and notes go to stderr.
Shared logic lives in analysis/impact.py (pure GraphDB-protocol helpers:
tests_for, untested_files, reverse_import_bfs, symbols_in_file,
endpoints_in_files) so the MCP tools and the CLI stay in lockstep.
feat: test-to-code mapping (tests_for/untested) + cgh impact CI mode
Ship .cs and .rb parsers behind a new `langs` optional extra (tree-sitter-c-sharp, tree-sitter-ruby), kept out of the core deps so the base install stays lean and 3.14-safe. - csharp.py: classes/interfaces/structs/enums/records, methods and constructors, using directives as imports, invocation and object creation as calls. Handles block and file-scoped namespaces. - ruby.py: classes and modules, def and def self. methods, require / require_relative as imports, method calls. - _discover_parsers now skips an optional-grammar module when its grammar package is absent, so cgh behaves exactly as before when the extra is not installed. Real import errors in hard-dep parsers still propagate. - Tests guarded with importorskip so they skip without the extra; with it installed they run (195 passed across test_parsers + test_indexer).
feat: C# and Ruby parsers behind an optional cgh[langs] extra
FEAT-14 proof of concept. Adds a jedi-backed resolver that does goto-definition on every Python call site and maps each resolved definition to the graph's Function id scheme, so cross-file call edges are exact instead of name-matched. jedi is the same static engine python-lsp-server wraps, which keeps the machinery far lighter than spawning a real LSP subprocess while leaving resolve_calls_for_file as the seam a true LSP backend could replace later. Strictly opt-in and Python-only. The new precise_calls config flag (off by default, CGH_PRECISE_CALLS env override) gates it, and the resolver imports jedi lazily behind the new cgh[lsp] extra. With the flag off or the extra absent, the indexer falls back to the existing name-matched resolver and behavior is unchanged. The resolver rebuilds target paths from the indexer's repo_root so ids match stored Function nodes even when jedi resolves symlinks, restores the recursion limit parso lowers on import, and caps call sites per file. Tests cover cross-file resolution, the Class.method id form, the env override, and the same-name collision the name matcher cannot get right; all degrade to a skip when jedi is missing.
feat: opt-in precise CALLS resolver for Python (cgh[lsp], jedi-backed)
- MCP Tools: bumped to 47, added the Code Intelligence section (file_summary, impact_of, path_between, import_cycles, tests_for, untested, hotspots, who_knows), role/layer filters on search, and the wider endpoint framework list. - Supported languages: config data (json/yaml/toml), SQL, and the optional C#/Ruby parsers; documented the langs/lsp/kuzu extras. - New cgh impact CLI command and the graph layers scope. - Configuration: precise_calls flag and CGH_PRECISE_CALLS / CGH_DB env vars. - Security: corrected the auth-key section to match the 0600-file model (the .mcp.json env injection was removed); .codegraph/ is now 0700. - Limitations: CALLS is same-file-first with an opt-in jedi precise path.
docs: update README for the audit features
Document the comma-separated bracket form (cgh[langs,lsp]) and the need to quote the spec so the shell does not glob the brackets.
docs: show how to combine install extras
Real captured output for the new CLI surface: a cgh impact blast-radius summary block, and the cgh graph layers Mermaid diagram as a GitHub-rendered mermaid block.
docs: add cgh impact + layer-diagram screenshots
Drops the codebase repo-layout tour; the Parser Plugin Architecture and Graph Schema sections that users actually need stay.
docs: remove the Architecture section from README
is_under_any resolved the subrepo roots but left an already-absolute candidate path untouched, and compared with pathlib's case-sensitive relative_to. On Windows the filesystem is case-insensitive and resolve() can change casing or 8.3 short-names, so every federated subrepo failed to match and none of its files were skipped: cgh init counted, and cgh index scanned, the whole tree including federated subdirs. Now both sides are resolved and os.path.normcase'd, with a separator-boundary prefix check (so /foo/services-bar is not treated as under /foo/services). This fixes the file census and the actual parent scan, which both route through is_under_any. POSIX behavior is unchanged (normcase is identity). Tests: nested-subrepo match (the landing-zone edf-sa/services-* layout), sibling-prefix boundary safety, root-itself, on top of the existing cases.
fix(federation): skip federated subrepos on Windows (path normalization)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cuts v0.5.0 from develop. A large, fully backwards-compatible feature release built on a complete code audit.
Highlights
cgh impact --sinceCLI command for CI/PR blast-radius (markdown/JSON, no server needed);cgh graph layersdiagram.cgh[langs](C#/Ruby) andcgh[lsp](precise Python calls via jedi)..codegraph/).Full detail in CHANGELOG.md.
Release mechanics
Version bumped to 0.5.0,
uv.lockrefreshed (matches), CHANGELOG updated. Merging this tomainthen taggingv0.5.0triggers the OIDC publish workflow (pending approval gate).