Skip to content

release: v0.5.0#79

Merged
joy-software merged 37 commits into
mainfrom
release/0.5.0
Jun 8, 2026
Merged

release: v0.5.0#79
joy-software merged 37 commits into
mainfrom
release/0.5.0

Conversation

@joy-software

Copy link
Copy Markdown
Contributor

Cuts v0.5.0 from develop. A large, fully backwards-compatible feature release built on a complete code audit.

Highlights

  • 8 new MCP tools (impact_of, file_summary, path_between, import_cycles, tests_for, untested, hotspots, who_knows) -> 47 total; plus role/layer search filters.
  • cgh impact --since CLI command for CI/PR blast-radius (markdown/JSON, no server needed); cgh graph layers diagram.
  • New parsers: config-as-data (JSON/YAML/TOML), SQL DDL; more endpoint frameworks (Django, NestJS, Spring, Gin/Echo).
  • Optional extras: cgh[langs] (C#/Ruby) and cgh[lsp] (precise Python calls via jedi).
  • Walk-up root discovery (works from any subdir, like git).
  • Fixes: DuckDB/Kuzu parity (no ghost callers), Windows federation skip, config enforcement, repo-keyed caches, status display, and a security batch (constant-time auth, argument-injection guards, force_index containment, SRI-pinned CDN, removed the dead auth env path + 0700 .codegraph/).

Full detail in CHANGELOG.md.

Release mechanics

Version bumped to 0.5.0, uv.lock refreshed (matches), CHANGELOG updated. Merging this to main then tagging v0.5.0 triggers the OIDC publish workflow (pending approval gate).

joy-software and others added 30 commits June 6, 2026 11:14
Two cosmetic bugs in cgh status:
- A fresh repo with no DB on disk said '(would create graph.db)', but DuckDB
  has been the default backend since v0.4, so a first index creates
  graph.duckdb. Corrected the text.
- The Endpoints row rendered as a bare ', ' when counts came from the
  FTS-only or unknown fallback (no graph read). Now shows 'unknown (graph
  locked)' or 'unknown' to match the Files cell.
fix: cgh status shows graph.duckdb default + clean empty Endpoints
cgh treated the working directory literally and only looked for .codegraph/
right there, so running cgh status / index / serve from a subdir of an
initialized repo reported no index even though the root one directory up
had it.

find_codegraph_root walks up to the nearest ancestor with a .codegraph/,
the way git finds its repo root via .git. main() resolves --root through it
for every command except init/setup (which create in the literal directory)
and the internal _serve_owner / _reindex_hook (which get an explicit root).
The 'Using codegraph root: ...' hint prints to stderr so stdout and --json
output stay clean.

Cross-platform: uses pathlib resolve()/.parents, which stop at the drive
root on Windows and the filesystem root on POSIX. Nearest .codegraph/ wins,
so a federated child's subdir resolves to the child, not the parent.

Tests cover current-dir, ancestor, absent, and nearest-wins.
feat: find the codegraph root from a subdirectory (git-style)
- Auth token compared with hmac.compare_digest instead of != (timing-safe);
  it is the loopback bridge's only auth check.
- Removed the dead auth env-injection path (inject_auth_key_into_mcp_json,
  validate_server_auth_key were never called) and corrected the auth.py
  lifecycle docstring: the 0600 file contents are the shared secret, there
  is no env hand-off. The .codegraph/ dir is now chmod 0700 at creation so
  auth.key's parent is owner-only too.
- index_changed_files rejects a 'since' ref starting with '-' and appends
  '--', so a value like '--output=PATH' can't be parsed as a git flag
  (argument injection via the MCP arg).
- pattern_search passes the user pattern after '--' (ripgrep) and via
  '-e' (git-grep), so a pattern like '--pre=sh' can't reach ripgrep's
  preprocessor (code exec).
- force_index refuses absolute paths resolving outside the repo root
  (new _within_repo guard) and surfaces them as refused_outside_repo,
  instead of indexing arbitrary files the project never declared.
- Pinned the mermaid CDN script to 11.4.1 with an SRI integrity hash and
  crossorigin, so a compromised CDN can't run JS in the generated report.

Tests: _within_repo containment cases; full suite green (391).
fix(security): close audit findings in the MCP owner and tools
- purge_file_data now also deletes the inbound side of self-referential
  edges (CALLS/INHERITS), matching Kuzu's DETACH DELETE. On DuckDB,
  find_callers had been returning ghost callers after a function changed.
- _resolve_calls prefers a same-file definition (a local run() no longer
  links to every run() in the repo) and memoizes name->ids per file,
  cutting the per-call-site query fan-out.
- index_file enforces max_file_size_kb and ignore_patterns, which were
  defined and documented but never applied; index_repo loads the config
  once per scan and threads it in (no per-file config read).
- _fts and .cghignore caches are keyed by repo root instead of a global
  singleton, so a multi-repo process no longer crosses streams.
- IMPORTS edges from a barrel re-export collapse to one whole-module edge
  past 50 named symbols instead of flooding the graph.
- Markdown links resolve relative to the file's own directory, so ./foo.md
  actually matches; collapsed the dead terraform kind branch; git-diff
  discovery timeout raised 10s->30s; find prunes ignore dirs at the walk
  level; reset_connection and rows() surface failures instead of swallowing;
  failed scan deletions are logged.

Tests: parity (no ghost caller after re-index), same-file CALLS scoping,
size cap + ignore-pattern skip, force bypass. Full suite green (396).
- Hoisted the parent+children fan-out into two helpers in federation.py
  (federate_scoped / federate_flat). tools_query, tools_arch, and tools_docs
  each had a near-identical copy under two different names; they now call
  the shared helper. CLAUDE.md already told contributors to reuse such a
  helper, but none existed.
- Switched the server modules off the deprecated for_each_child_kuzu alias
  to the canonical for_each_child_graphdb (DuckDB is the default backend,
  so the _kuzu name misled). Aliases stay one more release. Fixed the stale
  'Same as for_each_kuzu' docstring.
- __main__.py: factored the --root flag, repeated ~30 times, into a single
  _add_root(p) helper; typed every CLI handler as args: argparse.Namespace;
  replaced the two literal '---' separators in the doctor help with a colon.

No behavior change. Full suite green (396), ruff + no-ai-tells clean.
refactor: shared federation fan-out, drop _kuzu names, CLI cleanup
cmd_init (438 lines) split into named phase helpers: _print_prior_state,
_detect_ai_tools, _select_tools, _count_parseable_files, _print_file_counts,
_print_init_summary. Buried mid-function imports moved into the helpers that
use them; handler typed args: argparse.Namespace. cmd_init now reads as a
sequence of steps. Pure readability, no behavior change.
cmd_status's owner/RO/FTS fallback ladder extracted into _status_via_owner,
_status_via_ro_open, _status_via_fts (+ _empty_status_source for the shared
dict shape). cmd_status selects the first tier whose counts_source != 'none'
(same order/conditions) and renders once. Rendered output unchanged. Adds
tests/test_cli/test_status_fallback.py covering the dict shape and the RO
and FTS tiers without spawning an owner. Pure readability + tests.
refactor: decompose cmd_init into phase helpers
refactor: extract cmd_status 3-tier fallback + add tests
Add four read-only, federated graph-insight tools in
codegraph/server/tools_insight.py:

- file_summary: one-shot file orientation (role/layer/lang/module_doc,
  defined functions and classes, resolved imports, and importers).
- impact_of: reverse blast radius via bounded reverse-BFS over CALLS
  (callers) or IMPORTS (importers), grouped by role/layer with reaching
  endpoints. Carries the name-matched CALLS over-count caveat.
- path_between: shortest path over CALLS or IMPORTS via forward BFS,
  reported per scope so it never crosses repo boundaries.
- import_cycles: SCC detection (iterative Tarjan) over the IMPORTS graph.

Register the module in server/__init__.py. Add role/layer filters to
search_symbols and symbol_lookup in tools_query.py (backward compatible,
empty = no filter).

Tools are tested via a fake mcp whose .tool() decorator captures the
closures unchanged, then calling them against a tiny indexed tmp repo.
feat: graph-insight MCP tools (impact_of, file_summary, path_between, import_cycles) + role/layer filters
Endpoint extraction now covers Django urlpatterns, NestJS decorators,
Spring @*Mapping annotations, and Gin/Echo router calls alongside the
existing FastAPI / Express / Nuxt support.

New config-as-data parsers (JSON / TOML / YAML) replace the previous
bare-File stubs: each top-level key plus one nested level becomes a
section in the existing MdSection model, with k8s kind/name and GitHub
Actions jobs surfaced. A SQL DDL parser turns CREATE TABLE and
ALTER TABLE ... ADD COLUMN into table:<name> sections that list their
columns. No new graph node types or schema changes.

Parsers degrade cleanly on malformed input. Tests cover every new
endpoint framework, config parsing, SQL DDL, and a round-trip index_file
landing in the graph.
Add three git-history analysis features.

- analysis/churn.py: pure git-log functions. file_churn aggregates
  per-file commit counts, recency, authors, and line deltas from the
  bounded numstat log; file_ownership rolls up the top authors of one
  file. Results are cached per (repo_root, HEAD) so a long-lived owner
  does not re-shell out. Every git call has a timeout and degrades to an
  empty result when git is absent or the command fails.

- server/tools_history.py: hotspots joins churn with import in-degree
  (centrality over the IMPORTS edge) and ranks change-risk files; the
  score weights commit count 0.45, importers 0.35, recency 0.20, with
  log1p compression. who_knows returns a file's top authors. Both read
  _srv._root at call time and return JSON. Registered in server/__init__.

- viz layer-dependency diagram (FEAT-12): mermaid_layers builds a
  layer-to-layer IMPORTS graph from File.layer, ordered by
  roles.LAYER_ORDER, deduped with edge counts. Wired as a new "layers"
  kind into both the cgh graph CLI verb and the visualize_graph MCP tool
  (mermaid + dot), backend-neutral via the GraphDB protocol.

Tests: tests/test_analysis/test_churn.py builds a real git repo with
pinned identities; tests/test_server cover hotspots, who_knows, and the
layers diagram. 59 passed.
feat: git-history tools (hotspots, who_knows) + layer-dependency diagram
feat: config/SQL parsers + more endpoint frameworks
Add two MCP tools and a CI-oriented CLI command, all computed on the fly
from existing IMPORTS / CALLS edges plus File.role. No new edge type, no
schema change.

FEAT-3 (server/tools_tests.py):
  - tests_for(symbol_or_file): test files that import the target (or, for
    a symbol, whose functions call it). Returns the inferred mapping with a
    heuristic note, federated across parent + subrepos.
  - untested(role, layer): non-test source files no test imports, filtered
    by role/layer, capped at 200 with a truncation note.

FEAT-10 (cli/commands_impact.py): `cgh impact --since <ref>` for PR bots.
  Diffs the working tree against a git ref, reads the graph read-only (no
  MCP owner needed), and emits the changed symbols, the IMPORTS blast
  radius grouped by role/layer, endpoints touched, and tests to run. JSON
  on clean stdout (--json / --format json) or a markdown PR-comment
  summary (--format md). Degrades gracefully when the index is missing or
  the graph is locked. Banner and notes go to stderr.

Shared logic lives in analysis/impact.py (pure GraphDB-protocol helpers:
tests_for, untested_files, reverse_import_bfs, symbols_in_file,
endpoints_in_files) so the MCP tools and the CLI stay in lockstep.
feat: test-to-code mapping (tests_for/untested) + cgh impact CI mode
Ship .cs and .rb parsers behind a new `langs` optional extra
(tree-sitter-c-sharp, tree-sitter-ruby), kept out of the core deps so
the base install stays lean and 3.14-safe.

- csharp.py: classes/interfaces/structs/enums/records, methods and
  constructors, using directives as imports, invocation and object
  creation as calls. Handles block and file-scoped namespaces.
- ruby.py: classes and modules, def and def self. methods, require /
  require_relative as imports, method calls.
- _discover_parsers now skips an optional-grammar module when its
  grammar package is absent, so cgh behaves exactly as before when the
  extra is not installed. Real import errors in hard-dep parsers still
  propagate.
- Tests guarded with importorskip so they skip without the extra; with
  it installed they run (195 passed across test_parsers + test_indexer).
feat: C# and Ruby parsers behind an optional cgh[langs] extra
FEAT-14 proof of concept. Adds a jedi-backed resolver that does
goto-definition on every Python call site and maps each resolved
definition to the graph's Function id scheme, so cross-file call
edges are exact instead of name-matched. jedi is the same static
engine python-lsp-server wraps, which keeps the machinery far lighter
than spawning a real LSP subprocess while leaving resolve_calls_for_file
as the seam a true LSP backend could replace later.

Strictly opt-in and Python-only. The new precise_calls config flag
(off by default, CGH_PRECISE_CALLS env override) gates it, and the
resolver imports jedi lazily behind the new cgh[lsp] extra. With the
flag off or the extra absent, the indexer falls back to the existing
name-matched resolver and behavior is unchanged.

The resolver rebuilds target paths from the indexer's repo_root so ids
match stored Function nodes even when jedi resolves symlinks, restores
the recursion limit parso lowers on import, and caps call sites per
file. Tests cover cross-file resolution, the Class.method id form, the
env override, and the same-name collision the name matcher cannot get
right; all degrade to a skip when jedi is missing.
feat: opt-in precise CALLS resolver for Python (cgh[lsp], jedi-backed)
- MCP Tools: bumped to 47, added the Code Intelligence section (file_summary,
  impact_of, path_between, import_cycles, tests_for, untested, hotspots,
  who_knows), role/layer filters on search, and the wider endpoint framework
  list.
- Supported languages: config data (json/yaml/toml), SQL, and the optional
  C#/Ruby parsers; documented the langs/lsp/kuzu extras.
- New cgh impact CLI command and the graph layers scope.
- Configuration: precise_calls flag and CGH_PRECISE_CALLS / CGH_DB env vars.
- Security: corrected the auth-key section to match the 0600-file model
  (the .mcp.json env injection was removed); .codegraph/ is now 0700.
- Limitations: CALLS is same-file-first with an opt-in jedi precise path.
docs: update README for the audit features
Document the comma-separated bracket form (cgh[langs,lsp]) and the need to
quote the spec so the shell does not glob the brackets.
docs: show how to combine install extras
joy-software and others added 7 commits June 8, 2026 09:48
Real captured output for the new CLI surface: a cgh impact blast-radius
summary block, and the cgh graph layers Mermaid diagram as a GitHub-rendered
mermaid block.
docs: add cgh impact + layer-diagram screenshots
Drops the codebase repo-layout tour; the Parser Plugin Architecture and
Graph Schema sections that users actually need stay.
docs: remove the Architecture section from README
is_under_any resolved the subrepo roots but left an already-absolute
candidate path untouched, and compared with pathlib's case-sensitive
relative_to. On Windows the filesystem is case-insensitive and resolve()
can change casing or 8.3 short-names, so every federated subrepo failed to
match and none of its files were skipped: cgh init counted, and cgh index
scanned, the whole tree including federated subdirs.

Now both sides are resolved and os.path.normcase'd, with a separator-boundary
prefix check (so /foo/services-bar is not treated as under /foo/services).
This fixes the file census and the actual parent scan, which both route
through is_under_any. POSIX behavior is unchanged (normcase is identity).

Tests: nested-subrepo match (the landing-zone edf-sa/services-* layout),
sibling-prefix boundary safety, root-itself, on top of the existing cases.
fix(federation): skip federated subrepos on Windows (path normalization)
@joy-software joy-software merged commit 2b8e603 into main Jun 8, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant