Skip to content

fix(extract): route extensionless shebang scripts to their AST extractor#1680

Open
Stashub wants to merge 983 commits into
Graphify-Labs:mainfrom
Stashub:fix-extensionless-shebang-dispatch
Open

fix(extract): route extensionless shebang scripts to their AST extractor#1680
Stashub wants to merge 983 commits into
Graphify-Labs:mainfrom
Stashub:fix-extensionless-shebang-dispatch

Conversation

@Stashub

@Stashub Stashub commented Jul 5, 2026

Copy link
Copy Markdown

Problem

detect.classify_file already labels extensionless files with a bash/python/node/... shebang as CODE (via _shebang_interpreter), but extract._get_extractor dispatches purely on path.suffix. An extensionless CLI entry point — devctl, manage, gradlew-style wrappers — is therefore detected as code and then silently contributes zero nodes to the graph, and its doc-referenced symbols stay dangling stub IDs.

Found in the wild: a bash CLI devctl (#!/usr/bin/env bash, no extension) — the main executable of the corpus — was missing from the graph entirely while the 5 other code files extracted fine; the health diagnostic showed dangling-endpoint edges pointing at its never-created node IDs.

Fix

In _get_extractor, resolve extensionless files through the same detect._shebang_interpreter and a new _SHEBANG_DISPATCH map, so extract honors the same signal detect already trusts. Only interpreters with a real extractor are mapped (python/bash-family/node/ruby/lua/php/julia); detect's wider set (perl, fish, tcsh, Rscript) stays unmapped and skipped rather than being mis-parsed by a wrong grammar.

Tests

  • test_extensionless_shebang_via_dispatch — bash & python3 shebangs, incl. the env -S split-args form
  • test_extensionless_without_usable_shebang_stays_unsupported — plain text and perl stay None
  • test_extract_extensionless_bash_cli_end_to_end — node IDs follow the path-stem scheme, so doc-created stub IDs merge with the real code nodes

pytest tests/test_extract.py tests/test_detect.py — 235 passed.

🤖 Generated with Claude Code

safishamsi and others added 30 commits June 13, 2026 11:28
perf: parallelize save_manifest file hashing with ThreadPoolExecutor
…ersion probe

extract.py: clamp ProcessPoolExecutor max_workers to 61 on Windows (issue Graphify-Labs#1298).
Python's ProcessPoolExecutor hard-caps at 61 on Windows via WaitForMultipleObjects;
>61-core machines crashed on AST extraction. Clamp applied after all input paths
(auto-compute, GRAPHIFY_MAX_WORKERS, --max-workers) to cover all three.

build.py: skip ghost-merge when two AST nodes share (basename, label) key (issue Graphify-Labs#1257).
When same-named symbols appear in same-named files across directories (e.g. two
render() in two index.ts), last-writer-wins produced an arbitrary canonical node
and mis-pointed all edges. Now tracked in _loc_collisions; ambiguous keys are
skipped in Pass 2, leaving the ghost intact rather than merging into the wrong node.

__main__.py: ignore OSError on unreadable .graphify_version probes (issue Graphify-Labs#1299).
On restricted-permission installs or network mounts, .exists()/.read_text() raised
PermissionError and crashed every graphify query/explain/path call at startup.
All three FS probes now wrapped in try/except OSError: return.

prs.py: resolve claude.cmd on Windows in prs.py claude-cli backend (issue Graphify-Labs#1288).
The _call_llm and _call_claude_cli paths were already fixed; prs.py had the same
bare ["claude", ...] call that fails on Windows npm installs with WinError 2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-scanners

Wire bandit and pip-audit into CI
…-summaries-rfc

docs: add RFC for file-level node summaries
…or graphify-mcp

export.py: to_json now accepts community_labels and writes community_name onto
each node. Previously cluster-only wrote labels only to GRAPH_REPORT.md,
graph.html, and .graphify_labels.json — graph.json stored only the numeric cid,
so query/MCP showed blank or numeric community values (Graphify-Labs#1305).

__main__.py: pass community_labels=labels to to_json in cluster-only path.
explain command now prefers community_name over raw numeric community field.

serve.py: query and get_node read paths prefer community_name over community,
with fallback so old graphs without the field still work. Adds --graph flag as
an alias for the positional argument in graphify-mcp/_main(), fixing
"unrecognized arguments: --graph" for users following the documented pattern
shared by every other graphify subcommand (Graphify-Labs#1304).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_BASE_URL/ANTHROPIC_MODEL (Graphify-Labs#1273)

Custom OPENAI/ANTHROPIC base-url + model env vars for self-hosted and proxy endpoints. CI green (3.10/3.12).
…s, function expressions (Graphify-Labs#1323)

Extract JS/TS this.X=, exports.X=, prototype, class arrow fields, and function expressions (closes Graphify-Labs#1322). Validated locally against v8: full suite 2069 passed.
- Graphify-Labs#1315: add .psm1 to CODE_EXTENSIONS + _DISPATCH so PowerShell modules are indexed
- Graphify-Labs#1327: synthesize a module node for Swift import targets (new LanguageConfig
  flag synthesize_import_module_nodes) so imports edges survive build.py pruning;
  strengthen the Swift dangling-edge test to also assert edge targets
- Graphify-Labs#1317: dedupe parallel edges by (source,target,relation) in the --no-cluster
  and incremental update write paths so edge counts are deterministic and
  `update` is idempotent

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump version to 0.8.40; document this release in CHANGELOG (PowerShell .psm1
indexing, Swift import survival, no-cluster edge dedup, custom OpenAI/Anthropic
endpoints, JS/TS assignment-form extraction, community-name + --graph fixes,
four production bug fixes, perf, security CI); add .psm1 to the README
supported-extensions table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adopts the approach from Graphify-Labs#1330 (thanks @duncan-daydream) on top of the v0.8.40
Swift import fix: _import_swift returns (id,label) module pairs, the extractor
materializes a type=module anchor node per import, and _disambiguate_colliding_node_ids
exempts type=module nodes so the same module imported from N files collapses to
one shared node (enables reverse traversal "what imports CoreKit"). The --no-cluster
writer now dedupes nodes by id and edges to match the clustered build_from_json path.

Replaces the interim _import_label/synthesize_import_module_nodes mechanism.
Adds tests/test_swift_import_resolution.py (cross-file collapse, build survival)
and dedupe_nodes coverage. Refs Graphify-Labs#1327, Graphify-Labs#1330.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same-named types in different packages left implements/inherits edges stuck on
bare shadow stubs, isolating the real interface (_rewire_unique_stub_nodes only
fixes the globally-unique case). New _resolve_java_type_references pass uses each
referencing file's import statements (+ package decl) to build an FQN->def index,
re-points dangling implements/inherits/imports edges to the exact definition, and
drops the orphaned stub. External/stdlib imports stay unresolved (correct). Runs
after id-disambiguation so target ids are final. Java-scoped; other _extract_generic
languages share the same bare-name fallback and remain a follow-up.

Adds tests/test_java_type_resolution.py (simple, ambiguous-by-import, build-survival).
Refs Graphify-Labs#1318.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The query skill was split across two fragments so no platform got both
capabilities: Claude had the vocab/IDF query-expansion step but no fallback if
the CLI was unavailable; every other platform had the inline NetworkX fallback
but the weaker raw-question matcher. Merge into one unified query reference +
stub (Step 0 expansion -> CLI traversal -> inline NetworkX fallback, plus
path/explain inline) shipped to all hosts. Remove the query_variant enum, its
toml field, and the _CLI_ONLY_QUERY_HEADINGS coverage-audit exemption. Re-render
all skill artifacts and re-bless expected/. skillgen check/audit-coverage/
monolith-roundtrip/schema-singleton all pass. Refs Graphify-Labs#1325.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes Graphify-Labs#1338 (Unicode NFD/NFC): serve._find_node now matches tokenized labels; affected.resolve_seed NFC-normalizes + casefolds. Reviewed: full suite 2087 passed, CLI smoke clean, no regressions. Thanks @balloon72.
Fixes Graphify-Labs#1334: detect npm/yarn package.json workspaces (array + yarn object form), pnpm precedence preserved. Reviewed: full suite 2087 passed, no regressions. Thanks @balloon72.
Harden incremental no-cluster updates: fixes empty-write graph wipeout on no-op update --no-cluster (Graphify-Labs#1347) and git-hook subdir path resolution (Graphify-Labs#1348). Complementary to Graphify-Labs#1317. Validated: full suite 2107 passed, no-op re-run no longer wipes graph. Thanks @pkudinov.
…edge emission (Graphify-Labs#1331) (Graphify-Labs#1341)

Index PowerShell .psd1 manifests + emit Import-Module/dot-source edges (closes Graphify-Labs#1331). Builds on the shipped .psm1 support. Validated: full suite 2107 passed, 18 new tests. Thanks @geektan123.
…stale edges (Graphify-Labs#1344)

build_merge: prune a re-extracted file's stale nodes/edges before merge instead of accumulating (fixes Graphify-Labs#1283, Graphify-Labs#1285). Validated: full suite 2107 passed. Thanks @RelywOo.
…t fixes (Graphify-Labs#1357)

Harden HTML export against U+2028/U+2029 script-breakout XSS + two crash-on-adversarial-input fixes (non-dict LLM JSON, _extract_parallel IndexError). Validated: full suite 2107 passed, HTML export smoke clean. Thanks @mistic96.
…nstructors (Graphify-Labs#1356)

Capture property/field initializer constructor calls, build a per-file Swift
type table from property/parameter declarations, and add a member-call
resolution pass that types the receiver and emits an edge only when the type
name resolves to exactly one definition. Additive and INFERRED-only; the
is_member_call drop and the Graphify-Labs#543/Graphify-Labs#1219 god-node guards stay intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ned (Graphify-Labs#1361)

The /graphify --update runbook called build_merge with absolute prune_sources
but no root=, so _norm_source_file never relativized them to match the graph's
relative source_file values. Nothing was pruned and changed/deleted files left
ghost nodes that compounded on every incremental update. Fix the shared skillgen
fragment to pass root='INPUT_PATH' and re-render all platform artifacts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Labs#1324)

to_canvas built cards solely by iterating communities, so a graph with no
community data (--no-cluster builds, or a missing analysis sidecar) wrote the
empty 32-byte {"nodes":[],"edges":[]} shell while notes rendered fine. Fall back
to one synthetic community covering every node so the canvas reflects the graph.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Semantic/LLM edges occasionally omit source_file, which build only normalized
when already present, so the field reached graph.json empty and downstream
validation flagged it. Backfill from the source (then target) node in
build_from_json and in the --no-cluster raw-write path, which bypasses it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eleted-only

Completes the source_file convention fix begun in Graphify-Labs#1344 (build_merge
replace-on-re-extract) and Graphify-Labs#1361 (pass root= to build_merge in the --update
runbook). Two gaps still let the full build and incremental --update emit
different source_file bases for the same file, so the source_file-keyed replace
missed and duplicates accumulated:

1. extraction-spec(.md/-compact.md): the subagent's source_file slot was an
   unpinned "relative/path", so it invented a base per run (and the node id,
   derived from the same path, drifted too). Pin it to the verbatim FILE_LIST
   path so _norm_source_file(root) canonicalizes every run identically.

2. core.md: the full build called build_from_json WITHOUT root=, so Graphify-Labs#1361's
   update-side root= had no matching base on the full-build side. Pass
   root='INPUT_PATH' at both sites (Step 4 export, Step 5 report) so the full
   build and --update relativize to the same base.

update.md prune_sources = deleted only. Changed files are replaced by build_merge
(Graphify-Labs#1344); once root= aligns the bases, leaving `changed` in prune_sources would
delete the freshly re-extracted nodes.

Engine (build.py) unchanged. Regenerated all skill artifacts via
tools/skillgen/gen.py. Adds test_build_merge_root_collapses_convention_drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…te, and prefix-divergent labels (Graphify-Labs#1284, Graphify-Labs#1243)

Three pass-2 guards (mirrored in the --dedup-llm pair collection): block merges
when labels' embedded numbers differ as zero-padding-insensitive multisets;
block cross-file merges of file-anchored rationale/document nodes (same-file
still merges); and score cross-file long labels on plain Jaro instead of
Jaro-Winkler so the prefix bonus can't fabricate merges of shared-prefix but
token-divergent entities (jest-native vs react-native), while genuine cross-file
duplicates still clear Jaro and same-file near-duplicates keep Jaro-Winkler.

Co-Authored-By: van4oza <van4oza@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aphify-Labs#1365)

ollama/openai/deepseek/kimi set max_tokens in their backend config, but the
openai-compat dispatch read only max_completion_tokens (which only gemini
defines), so their output silently capped at the 8192 fallback and truncated
deep-mode JSON. Read either key and give the openai config an explicit cap;
GRAPHIFY_MAX_OUTPUT_TOKENS still overrides.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oubleshooting

Add changelog entries for the Swift cross-file (Graphify-Labs#1356), update prune (Graphify-Labs#1361),
obsidian canvas (Graphify-Labs#1324), and edge source_file backfill (Graphify-Labs#1279) fixes that
shipped without changelog notes, and a README troubleshooting entry for the
LLM JSON-truncation warnings and how to reduce them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
safishamsi and others added 29 commits July 2, 2026 23:25
…s#1609)

C# had no member-call resolver (unlike Swift/Python/Ruby/TS/C++/ObjC), so
`recv.Method()` fell back to a bare method-name match against label_to_nid —
which, under ambiguity, silently mis-bound `_server.Save()` to an unrelated
`Cache.Save()`. That's a WRONG edge, not just a missing one, and it left
delegation-heavy C# call graphs (wrappers, service layers) blind across typed
member/param boundaries.

Mirrors the C++ Graphify-Labs#1547 pattern:
- capture the member_access_expression receiver (simple identifier or `this`)
  into member_receiver and set is_member_call in the C# invocation branch;
- defer ALL C# member calls with a receiver to the resolver (tgt_nid = None) so
  the bare in-file match can't fire, and emit a raw_call tagged lang="csharp";
- _csharp_member_type_table: file-wide name -> Type from fields, properties,
  parameters, and locals (incl. `var v = new T()`), first-binding-wins;
- _resolve_csharp_member_calls: `this` -> enclosing class (EXTRACTED),
  capitalized -> the named type (EXTRACTED), else the receiver's table type
  (INFERRED), each gated by the single-definition guard; no method on the type
  -> no edge. Registered for .cs.

Verified: the ambiguous `_server.Save()` now resolves to Server.Save and NOT
Cache.Save; field/param/local/this/Type.static/cross-file all resolve; dynamic
receiver and absent-method emit nothing; unqualified calls unregressed. 8 new
tests, full suite 2841.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raphify-Labs#1618)

A node whose source_file equals the absolute scan root (e.g. a project-level
semantic concept the LLM attributed to the whole repo) relativized to Path('.'),
and _semantic_id_remap fed that into _file_stem, whose path.with_suffix("")
raises `ValueError: '.' has an empty name`. The crash landed in final graph
assembly — AFTER all LLM extraction cost was spent — writing no graph.json at
all, and leaving `cluster-only` to then report "no graph found".

Two guards: _file_stem returns "" for a name-less path (protects every caller,
not just this one), and both _semantic_id_remap passes skip a root-equal
source_file explicitly (it has no per-file identity to remap — id left
untouched). Reported with a minimal LLM-free repro by @sub4biz.

Not a 0.9.5 regression: _semantic_id_remap/_file_stem are byte-identical to
0.9.4; the latent path was only hit when dedup produced a root-source_file node.
4 regression tests (dot-path stem, remap no-crash, build_from_json with a
root-level concept node, normal remap unaffected). Full suite 2849.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…abs#1445)

_pick_seeds' gap_ratio cutoff discards any candidate scoring below 20%
of the top score. On a multi-term natural-language query, one term's
incidental EXACT label match on a node that is otherwise unrelated to
the query's intent (e.g. a common word also used as a field name or
identifier elsewhere in the corpus) scores ~1000x higher than any
SUBSTRING match on the query's other, actually-relevant terms
(_EXACT_MATCH_BONUS vs _SUBSTRING_MATCH_BONUS). The cutoff then
silently discards every one of those substring-tier candidates as BFS
seeds, so the traversal only ever explores the neighborhood of the one
unrelated exact match, and `query` returns confidently-wrong results
with no signal that anything went wrong.

This matches Graphify-Labs#1445's reproduction exactly: a vague query that doesn't
name a target symbol seeds from unrelated "concept-dense" nodes
instead, even though the target node is present in the graph.

_pick_seeds now optionally accepts the graph and the tokenized query
terms; when supplied, it guarantees at least one seed per distinct
term that has any match at all, so one term's collision cannot starve
out the others. Ties within a term are broken by node degree, so an
isolated incidental match doesn't out-rank a real, well-connected hub
for that term. The parameters default to None and existing callers
that don't pass them see byte-identical behavior (see
test_pick_seeds_without_diversity_args_is_unchanged).

Adds a regression test reproducing the exact failure shape from
Graphify-Labs#1445 and confirms the previously-starved target node is recovered
as a seed once G/terms are supplied.

Full test suite (74 tests) and ruff both pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-Labs#1623)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1619 A2)

The query reference doc's inline vocab-harvest / fallback-search snippets used
bare Path(...).read_text()/write_text(), which on Windows (default cp1252)
crash with UnicodeEncodeError on the cross-language corpora the doc itself
demonstrates (Cyrillic labels like обработчик). Add encoding="utf-8" to all
five sites in the skillgen source fragment and regenerate; blessed expected/,
skillgen --check + --monolith-roundtrip green.

Scoped to the concrete reproduced crash; the larger Graphify-Labs#1619 findings (the
Windows .exe interpreter-guard rewrite, INPUT_PATH backslash guidance, BOM
handling) are a separate skill-template pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…receivers (Graphify-Labs#1630)

The Graphify-Labs#1316 resolver handled `this.injectedField.method()`, but a receiver whose
type comes from a local `const x = new Foo()` binding (Pattern A) or a
type-annotated parameter — including inside a returned closure (Pattern B) —
produced no calls edge, so `affected <method>` silently under-reported.

- _ts_receiver_type_table: augment the per-file type table with local
  `new` bindings (name -> constructor type) and bare-typed parameters
  (`(svc: Svc)` -> svc: Svc), merged after the constructor-injection entries
  (which win on a name clash). Only a bare type_identifier is recorded — an
  array/union/generic/qualified/predefined type is skipped (precision).
- walk_calls now descends into an inline/returned JS/TS closure that is not
  separately tracked in function_bodies (e.g. `return () => svc.doThing()`),
  attributing its calls to the enclosing function, instead of stopping at the
  arrow boundary. A tracked-body-id set prevents double-walking const-assigned
  arrows.

The existing _resolve_typescript_member_calls then resolves both via the
receiver type with its single-definition guard. Verified on the real-CLI shape
(absolute paths + graphify-out cache): both patterns resolve, ambiguity binds
to the right class (Svc not Cache), untyped/array-typed receivers emit nothing.
5 tests, full suite 2871.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hify-Labs#1236 follow-up)

The Graphify-Labs#1236 fix guarded to_obsidian's member loop but not to_canvas, so
`graphify export obsidian` (which also writes graph.canvas) still crashed with
KeyError on a community member id absent from G — after the notes exported,
leaving a partial mirror. Reported on 0.9.5 by @swells808.

Apply the same `m in G and m in node_filenames` filter in both to_canvas loops:
the box-sizing loop (so the group box matches the cards actually laid out) and
the card-layout loop (so the sort/label deref and the node_filenames fallback
never touch a dangling id). Regression test added alongside the to_obsidian one.
Full suite 2872.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fy-Labs#1631, Graphify-Labs#1638, Graphify-Labs#1632)

Graphify-Labs#1631: a malformed LLM chunk (a stray non-dict entry in edges/nodes/hyperedges)
crashed the AST+semantic merge and the semantic-cache write with
`AttributeError: 'list' object has no attribute 'get'`, discarding every
successful chunk and writing no graph.json. `_parse_llm_json` now sanitizes each
fragment at the single parse chokepoint (dict entries only; non-list values
coerced to []), protecting the cache writer, the adaptive-retry merge, and the
CLI merge in one place.

Graphify-Labs#1638: an unresolved bare npm import (`import colors from "tailwindcss/colors"`)
emitted an imports_from edge to the bare id `colors`, which build.py's
pre-migration alias index then remapped onto an unrelated local file of that
stem (backend/utils/colors.py) - a confident EXTRACTED cross-language phantom
edge, one per importing file. The external-import fallback now namespaces its
target with the `ref` prefix (the J-4 convention), so it can never collapse to a
local node id; the ref target has no node, so build drops it as an external
reference.

Graphify-Labs#1632: with a parallel LLM backend, extract_corpus_parallel merged chunk results
in completion order, so which network call returned first reordered nodes/edges
run-to-run even when the model returned identical content - churning graph.json.
Chunks are now merged in deterministic submission order after the pool drains
(matching the serial path); the progress callback still fires in completion
order. The model's own content variance is unchanged (irreducible).

Full suite: 2882 passed, 3 skipped. Validated end-to-end via a local wheel build
on a mixed TS+Python corpus: `explain colors.py` shows only the real importer,
and graph.json is byte-identical across repeated runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
interface X extends A, B captured the parent list in iface_re group 2 but the handler only read group 1, so no inheritance edge was emitted. Split the parent list and emit one extends edge per parent (mirroring the class branch).
class Foo : Bar by baz produced no edge because the delegation_specifier loop only handled constructor_invocation and bare user_type children; the by form wraps user_type in an explicit_delegation node. Add that branch so the implements edge (and generic-arg recovery) fires.
…hify-Labs#1644 (kotlin by delegation)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stant-receiver calls (Graphify-Labs#1640, Graphify-Labs#1634)

Graphify-Labs#1640 (node extraction): the extractor only created nodes for `class Foo`, so
plain `module Foo`, `Foo = Struct.new(...) do ... end`, `Foo = Class.new(Super)`
and `Result = Data.define(...)` produced no container node — their methods hung
off the file via `contains` with dot-less labels and no edge could target them.
`module` is now a container type (methods attach via `method`, nested modules
included), and a constant assignment whose RHS is Struct.new/Class.new/Data.define
synthesizes a class node named after the constant, attaches block-defined methods
to it, and emits an `inherits` edge for `Class.new(Super)`. Plain constant
assignments (MAX = 100, X = Foo.new) are untouched.

Graphify-Labs#1634 (resolution): constant-receiver singleton calls (`Service.call`,
`Model.where`, `SomeJob.perform_async`) emitted no edge, so a Zeitwerk-autoloaded
Rails app (no requires) had near-zero cross-file edges. resolve_ruby_member_calls
now handles a capitalized receiver with any callee: bind to the class's owned
singleton/instance method (`def self.call`) when present, else to the class node
itself so inherited/dynamic class methods (ActiveRecord where/find_by) still give
blast-radius. Namespaced receivers resolve by bare class name. The
single-owning-class god-node guard is kept — ambiguous receivers resolve to
nothing, never a wrong edge.

The two compound: PaymentProcessor#process -> TaxCalculator.rate_for needs the
module node (Graphify-Labs#1640) AND the resolver (Graphify-Labs#1634); both now land.

Full suite: 2893 passed, 3 skipped. Adversarial smoke confirms no false class
nodes from plain/multiple assignments and no self-loops on self-class calls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
19 fixes/features since 0.9.5. Highlights:
- Ruby: module/Struct.new/Class.new/Data.define container nodes (Graphify-Labs#1640) and
  constant-receiver singleton-call resolution (Graphify-Labs#1634) — Rails/Zeitwerk graphs
  now get real cross-file edges.
- Kill cross-language phantom imports_from edges from unresolved bare npm
  imports (Graphify-Labs#1638); harden semantic extraction against malformed LLM chunks
  (Graphify-Labs#1631); deterministic graph.json node/edge ordering for parallel semantic
  backends (Graphify-Labs#1632).
- Contributor extractor fixes: Apex interface multiple inheritance (Graphify-Labs#1645),
  Kotlin `by` delegation (Graphify-Labs#1644).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "What files it handles" code row omitted several extensions that reuse
existing tree-sitter grammars (so the grammar count is unchanged): `.mts`/`.cts`
(TypeScript, Graphify-Labs#1607, new in 0.9.6), `.cc`/`.cxx` (C++), `.kts` (Kotlin), `.psd1`
(PowerShell), `.toc` (Lua). Apex (`.cls`/`.trigger`) and Terraform already have
their own rows. `.r`/`.ejs`/`.ets` are intentionally left out — they are in
CODE_EXTENSIONS but have no registered extractor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…phantom cross-package edges (Graphify-Labs#1659)

When a callee had exactly one same-named definition repo-wide, the cross-file
resolver emitted a `calls` edge at INFERRED/0.8 even with no import path between
caller and callee. On a monorepo this fabricated dependencies: a 14-package repo
showed `platform`/`sidecar` depending on `registry-protocol` purely because it
exported generically-named symbols that unresolved calls collapsed onto.

JS/TS modules have no implicit cross-module scope, so a cross-file call is real
only if the caller imported it. Direct JS/TS cross-file `calls` attribution is
now gated on import evidence and left unresolved otherwise. Scoped to direct
calls: other languages keep the Graphify-Labs#1553 single-candidate resolution (C/C++
headers, Ruby autoload, same-package implicit scope), and the indirect_call path
(already INFERRED + callable-gated) is untouched.

Also hardens caller/candidate -> file mapping to resolve via the node's
`source_file` string (identifying the file node by its basename label) instead
of `relative_to(root.resolve())`, which threw on a path-resolution/symlink
mismatch and fell back to a non-matching absolute id — spuriously failing import
evidence. This both makes the new gate safe and fixes legitimate cross-file
calls being mislabeled INFERRED instead of EXTRACTED.

Full suite: 2898 passed, 3 skipped. Verified via CLI on the reporter's repro
(phantom dropped) and a control (imported call resolves EXTRACTED).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cache word counts (Graphify-Labs#1649, Graphify-Labs#1655, Graphify-Labs#1656)

Graphify-Labs#1649: detect_incremental tracks the converted markdown sidecar, and
convert_office_file early-returned whenever the sidecar existed — so a .docx/
.xlsx edited after its first conversion never updated its sidecar and was
reported "unchanged" forever, freezing the graph. It now re-converts when the
source is newer than the sidecar (bumping the sidecar so the hash check catches
it); an unchanged source still skips the rewrite (Graphify-Labs#1226).

Graphify-Labs#1655: _md5_file/save_manifest/count_words used plain open()/stat(), which the
Windows file APIs reject for absolute paths over 260 chars unless prefixed with
`\\?\`. Deeply-nested files never hashed, their manifest entry never stabilized,
and detect_incremental re-flagged them as changed every run. A new _os_path adds
the extended-length prefix on win32 for change-detection I/O (mirror of
cache._normalize_path, which strips it for keys). No-op elsewhere.

Graphify-Labs#1656: detect() re-parsed every PDF/docx/text file to size the corpus on each
run. Word counts are now memoized in the existing content-hash stat index (keyed
by size + mtime_ns), so an unchanged file is parsed once. file_hash's fastpath is
guarded so a word-count-only entry (no hash) can't KeyError, and both writers
augment a co-located entry in place instead of clobbering the other's field.

Full suite: 2906 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… noise (Graphify-Labs#1635, Graphify-Labs#1646, Graphify-Labs#1657)

Graphify-Labs#1635: the windows skill variant declared `name: graphify-windows`, but
`graphify install --platform windows` writes it to ~/.claude/skills/graphify/
SKILL.md and Claude Code requires the folder name to equal the frontmatter
`name` — the suffix broke discovery. platforms.toml now sets name = "graphify"
(regenerated + re-blessed).

Graphify-Labs#1646: the OpenCode (and Kilo) plugin prepended its reminder with `&&`, which
Windows PowerShell 5.1 rejects as a statement separator, breaking the first
bash command of every session. Switched to `;` (valid in PowerShell 5.1, Bash,
POSIX).

Graphify-Labs#1657: the GRAPH_REPORT.md "Import Cycles" section printed "None detected" on
documents-only corpora where imports don't exist — now gated on code nodes /
import edges being present. The other two items in that issue (mojibake in
manifest/report, stdout encoding) are already handled on current v8: both files
are written UTF-8 and main() reconfigures stdout/stderr to UTF-8.

Full suite: 2909 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a benchmark writeup covering graphify as long-term memory (LOCOMO,
LongMemEval-S vs mem0/supermemory/bm25/dense/hybrid) and as a code-intelligence
layer (ERPNext), run on graphify's own harness with competitors as adapters:
one shared model (Kimi K2.6), identical budgets, shared BGE-m3 embedder where
allowed, and a judge blind-validated against a second judge (90.6% agreement,
kappa 0.81). Numbers are wins-forward but every retained figure is exact; the
supermemory recall comparison is labeled embedder-confounded. README gets a
short Benchmarks section linking to it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es (Graphify-Labs#1666)

krishnateja7 reported that on a full-repo run a stable subset of Ruby files
yields zero nodes (not even a file node), each fine in isolation, drop set
byte-stable across runs. Root cause is a transient batch/parallel extraction
that produces an empty result, which then gets cached and persists.

Every extractable file yields at least a file node, so a zero-node result is
anomalous. Both extraction paths (parallel worker and sequential fallback) now
skip the cache write when a non-error result has no nodes, so a rerun re-extracts
and self-heals instead of loading the stale empty. extract() also warns, listing
the files that landed in the graph with zero nodes, so the previously-silent
blindness in affected/explain is visible.

This addresses the persistence and the silent blindness. The underlying trigger
(why a valid file occasionally extracts empty when co-processed with certain
others) was not reproducible with synthetic corpora; the warning now surfaces it
for a concrete report if it recurs.

Full suite: 2912 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y-Labs#1241)

`_dynamic_import_js` emitted a deferred `import('./x')` as a plain
`imports_from` edge, so `find_import_cycles` counted it as a static import.
A file that statically imports another which dynamically imports it back was
reported as a phantom circular dependency.

Keep the edge as `imports_from` (the dependency stays visible in the graph)
but mark it `deferred`, and skip deferred edges in `find_import_cycles`.

Closes Graphify-Labs#1241
…aphify-Labs#1241

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…od-bound callers in affected (Graphify-Labs#1668, Graphify-Labs#1669)

Graphify-Labs#1668: Ruby `include`/`extend`/`prepend <Const>` in a class/module body now emits
a `mixes_in` edge to the module. The mixin is captured during the node walk and
resolved cross-file by resolve_ruby_member_calls (single-owner guard, reusing
the Graphify-Labs#1640 module nodes as targets). The shared call pass skips these markers so
they are not mislabeled as `calls`. `extend self` and non-constant args are
skipped; ambiguous/undefined modules produce no edge. Rails concern composition
is now visible to affected/explain.

Graphify-Labs#1669: affected <Class> seeds the reverse walk with the root's own member nodes
(one method/contains hop) so callers that bind at method granularity (e.g.
Service.call -> the def self.call node, Graphify-Labs#1634) are reachable from the class.
method/contains stay out of the general relation-filtered walk (no forward
noise), and the seeded member nodes are not reported as hits.

Full suite: 2924 passed, 3 skipped. Verified end-to-end (Rails-shaped repros)
plus edge cases: extend self / undefined / ambiguous mixins emit nothing, mixins
are not emitted as calls, member methods aren't reported, class-level callers
still resolve, and one-hop seeding does not pull in downstream classes' methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the old v4-hosted SVG wordmark with the new brand logo (graph-cube
icon + "Graphify" on the green brand gradient), tightly cropped from the source
export (1384x645, ~2.15:1, even ~90px padding). Served from docs/logo.png on v8.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detect.classify_file already labels extensionless files with a bash/python/
node/... shebang as CODE via _shebang_interpreter, but _get_extractor
dispatched purely on path.suffix — so a CLI entry point like `devctl` or
`manage` was detected as code and then silently contributed zero nodes to
the graph (its doc-referenced symbols stayed dangling stubs).

Resolve extensionless files through the same _shebang_interpreter and a
new _SHEBANG_DISPATCH map. Only interpreters with a real extractor are
mapped (python/bash-family/node/ruby/lua/php/julia); detect's wider set
(perl, fish, tcsh, Rscript) stays unmapped and skipped rather than being
mis-parsed by a wrong grammar.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Stashub Stashub force-pushed the fix-extensionless-shebang-dispatch branch from 3ba6ab5 to 94239d6 Compare July 5, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.