fix(perl): eliminate false-positive Perl CALLS edges (builtins, framework method calls, config strings) by halindrome · Pull Request #477 · DeusData/codebase-memory-mcp

halindrome · 2026-06-16T13:17:54Z

Summary

Perl files are extracted (call sites emitted), and any call the textual resolver can't place falls back to a generic short-name matcher with no language or call-kind awareness. It wires Perl builtins, framework method calls, and mis-parsed config strings to unrelated project subs that merely share a name — polluting the Perl call graph with false-positive CALLS edges.

This fixes the three sources, all gated on CBM_LANG_PERL so the generic resolver and CBMCall (shared by all 10 languages) stay byte-identical for non-Perl.

What changed

fix(perl): stop extracting config strings as call targets — extract_scripting_callee (Perl branch) now extracts the real method/function name token and rejects non-identifier callees (containing ., quotes, whitespace, …), so dotted config strings/literals (e.g. log4perl.appender.File.utf8) never become call targets.
fix(resolver): don't match Perl builtins to project subs — adds a curated Perl builtin set (src/pipeline/registry.c); when an unresolved Perl call's name is a builtin, the generic edge is suppressed. Real same-file subs are already resolved by earlier stages before the generic fallback, so this only drops spurious builtin matches.
fix(resolver): suppress generic short-name matching for Perl method calls — adds is_method to CBMCall (default false → no-op for other languages), set during Perl extraction for arrow/method calls, threaded into resolve_single_call / pass_parallel’s resolver. Perl method calls with an unknown receiver no longer generic-match to free subs (precise method resolution is the LSP's job).
Tests in tests/test_extraction.c and tests/test_registry.c covering builtins, config-string rejection, method-call suppression, a genuine-call-still-resolves case, and a cross-language no-op check.

Validation

Re-indexing a large real Perl monorepo (~1,200 modules + 352 .cgi endpoints) with the fix:

metric	before	after
`.cgi` `suffix_match` edges	4,940	655 (−87%)
`.cgi` builtin / CPAN-method / config-string noise	~4,000	0
project-wide `CALLS` edges	~182.5k	~169.4k (−13.4k noise removed)

This is precision via noise removal — fewer, more-correct edges. Genuine intra-project resolution survives.

scripts/build.sh — clean (-Werror).
scripts/test.sh — green except the unrelated pre-existing cli_hook_gate_script_no_predictable_tmp_issue384; cross-language breadth check [CALLS-BREADTH] 53 langs: 0 FAILURES confirms all other languages still resolve.
clang-format — clean on changed files.

Closes #476

🤖 Generated with Claude Code

The Perl branch of extract_scripting_callee blindly returned the text of child(0) of every call node. In config-heavy Perl (.cgi/.pl with embedded log4perl-style config), tree-sitter-perl misparses dotted config tokens (e.g. "log4perl.appender.File.utf8") into call-shaped nodes, and that dotted string was emitted as a callee_name, later matched by the generic short-name resolver to unrelated project subs. Now the Perl branch pulls the real name token (method/function field, else child(0)) and validates it as a bare Perl sub/method identifier via perl_is_identifier_callee: must start with a letter or '_' and contain only [A-Za-z0-9_:] (allowing the '::' package separator). Any '.', whitespace, quote, or '/' disqualifies it and NULL is returned so no CALLS edge forms. Gated to CBM_LANG_PERL; other languages are untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>

Perl builtins (push/shift/keys/sprintf/...) carry no language or call-kind awareness through the generic name-matcher in cbm_registry_resolve. When a project defines a sub whose name collides with a builtin, an invocation of the builtin was wired to that sub by same-module / suffix matching - a false-positive CALLS edge. Adds cbm_perl_is_builtin (curated, sorted bsearch set of 94 perlfunc core builtins) and applies it in both call-resolution passes (sequential resolve_single_call and parallel resolve_file_calls), gated on the file language == CBM_LANG_PERL and only AFTER LSP resolution has declined, so a genuine LSP-resolved call is never suppressed. The file language is threaded into both resolvers via a new trailing CBMLanguage parameter; every other language reaches cbm_registry_resolve unchanged (byte-identical behavior). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>

…alls A Perl method call ($obj->m / Class->m) carries no receiver type at the structural tier, so the generic short-name matcher in cbm_registry_resolve would wire $dbh->commit / $cgi->param / $logger->log to any project sub sharing the bare method name - the dominant source of false-positive CALLS edges in CPAN/framework-heavy Perl. Resolving such a call correctly is the LSP's job, not the bare-name matcher's. Adds a CBMCall.is_method flag (zero-init false, so all other languages and existing call sites are unaffected). method_call_expression is added to the Perl call node set and handle_calls sets is_method=true only for that node type when the file language is Perl. Both call-resolution passes then skip generic resolution for Perl method calls (combined with the builtin guard from the prior commit). Genuine intra-project function calls (non-method, non-builtin) still resolve as before. LSP-resolved method calls are unaffected because the guard runs only after LSP resolution declines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>

…trings) Hermetic tests for the three Perl call-graph noise fixes: test_extraction.c (extraction tier): - config string is never emitted as a callee; genuine call still extracted - builtin calls (push/keys) extracted but never flagged is_method - arrow/method calls ($self->commit / $dbh->commit) set is_method=true, while the genuine function call (helper) does not - a JS method call never sets is_method (flag is Perl-only — other languages unaffected) test_registry.c (resolver tier): - cbm_perl_is_builtin recognizes core builtins (incl. first/last of the sorted set) and rejects project subs, case variants, empty, and NULL Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>

Round 1 (Claude panel + DO DeepSeek) findings: - Rework the Perl noise guard so it suppresses only WEAK generic matches (suffix_match/unique_name) and KEEPS high-confidence same_module/import_map. The prior guard ran before cbm_registry_resolve and dropped genuine same-file calls to builtin-named subs (e.g. a project sub log/index/open called as a bare function) that pre-PR resolved via same_module. Extracted the decision into pure, unit-tested cbm_perl_suppress_generic_match() shared by the sequential (pass_calls.c) and parallel (pass_parallel.c) resolvers; corrected the inaccurate comments (Perl has no LSP/textual stage before the guard). - Tighten perl_is_identifier_callee to require '::' pairs (reject a lone ':', ':::', or trailing '::'). - Add resolver-contract tests covering weak-match suppression, same_module/ import_map retention, genuine-call survival, non-Perl no-op, and NULL strategy. Verified on a real Perl monorepo: .cgi builtin/CPAN/config-string noise stays eliminated while same_module edges to builtin-named subs are recovered. Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>

halindrome · 2026-06-16T14:14:58Z

QA Round 1

Reviewers: Claude Code (claude-opus-4-8) parallel review panel (3 lenses) + DigitalOcean DeepSeek (deepseek-v4-pro, --double).

Contract Verification (issue #476)

Criterion	Verdict	Evidence
(a) builtin call does not edge to same-named project sub	pass	suppressed at the resolver; builtin set bsearch-correct (94 entries)
(b) method call w/ unknown receiver not generic-matched	pass	`is_method` set at `extract_calls.c` for `method_call_expression` (in `perl_call_types`); suppressed at resolver
(c) config-string token not extracted as callee	pass	`perl_is_identifier_callee` rejects non-identifier tokens
(d) genuine intra-project Perl calls still resolve	pass after fix	see Finding 2 — guard reworked to keep `same_module`/`import_map`
(e) non-Perl languages byte-identical	pass	all suppression gated on Perl; `[CALLS-BREADTH] 53 langs: 0 FAILURES`
(f) tests cover the above	pass after fix	see Finding 1 — added resolver-contract tests

Findings

[claude] Finding 1 — Test gap (major, fixed). The suppression branch and edge-survival path had no resolver-level test (extraction-level + a builtin-set unit test only); the test comments promised end-to-end coverage that didn't exist. Fixed: extracted the suppression decision into pure cbm_perl_suppress_generic_match() and added contract tests (weak-match suppression, same_module/import_map retention, genuine-call survival, non-Perl no-op, NULL strategy).

[claude] Finding 2 — Builtin guard over-suppressed genuine same-file calls (regression, fixed). Perl has no LSP resolver, so the guard ran before cbm_registry_resolve, dropping a genuine same-file call to a builtin-named sub (e.g. a project sub log/index/open called as a bare function) that pre-PR resolved via same_module. Reworked: resolve first, then suppress only weak strategies (suffix_match/unique_name) and keep high-confidence same_module/import_map. Corrected the inaccurate inline comments. Verified on a real Perl monorepo: .cgi noise stays eliminated while same_module edges to builtin-named subs are recovered.

[claude|do:deepseek-v4-pro] Finding 3 — perl_is_identifier_callee accepted a lone : (minor, fixed). It allowed any :, so Foo:Bar/::x/Foo:::Bar would pass. Tightened to require :: pairs (reject lone :, :::, trailing ::). Harmless in practice (the grammar never emits such callee tokens) but now matches its own docstring.

[do:deepseek-v4-pro] Finding 4 — locale-dependent isalpha/isalnum (minor, advisory, not changed). Byte classification for identifiers is locale-dependent for bytes >127. Left as-is for this round (consistent with surrounding code; no reachable trigger from the grammar). Tracked as a possible follow-up.

[do:deepseek-v4-pro] "is_method never set in extraction" (reported critical) — REFUTED (false positive). The DO reviewer chunked the diff by commit and reviewed the config-string commit in isolation. call.is_method is set at internal/cbm/extract_calls.c (handle_calls, method_call_expression) and method_call_expression is in perl_call_types (lang_specs.c). Verified present and correct.

Result

3 confirmed findings fixed (1 major + 1 regression + 1 minor); 1 advisory deferred; 1 reported-critical refuted. Fixes committed as fix(perl): address QA round 1. Build clean (-Werror), clang-format clean, suite green (5611 passed; the 1 failure cli_hook_gate_script_no_predictable_tmp_issue384 is a pre-existing sandbox-only flake unrelated to this PR).

SAST: GitHub code scanning is not enabled on this repo — security delta skipped (non-blocking).

QA performed by Claude Code (claude-opus-4-8) parallel panel + do:deepseek-v4-pro

Round 2 (Claude panel) caught a regression introduced by the round-1 refactor: cbm_perl_suppress_generic_match whitelisted only the exact strategies "same_module" and "import_map", but resolve_import_map can also return "import_map_suffix" (confidence 0.85 — a genuine import-based resolution, not a weak short-name guess). A '::'-qualified Perl builtin/method call resolved via the import-suffix fallback was therefore dropped, contradicting the helper's documented contract and partially missing acceptance criterion (d). Add import_map_suffix to the kept (high-confidence) set so only the weak short-name strategies (suffix_match / unique_name) are suppressed; update the doc comment and add a unit-test case asserting import_map_suffix is retained. Deferred as advisory (non-blocking, noted on the PR): a hypothetical leading-'::' (main:: shorthand) under-extraction in perl_is_identifier_callee, and a colon-edge-case coverage gap (logic correct by inspection). Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>

halindrome · 2026-06-16T14:46:24Z

QA Round 2

Reviewer: Claude Code (claude-opus-4-8) parallel review panel (3 lenses). The DigitalOcean DeepSeek second opinion timed out this round (~7 min, no output) — recorded as a non-blocking second-opinion failure; round 1 captured a DO opinion.

Contract Verification (issue #476)

Criterion	Verdict	Note
(a) builtin call → no edge to same-named sub	pass
(b) method call w/ unknown receiver → no generic match	pass
(c) config-string token not extracted	pass
(d) genuine intra-project Perl calls still resolve	pass after fix	see Finding 1
(e) non-Perl byte-identical	pass	`[CALLS-BREADTH] 53 langs: 0 FAILURES`
(f) tests cover the above	pass	+ new `import_map_suffix` retention case

Findings

[claude] Finding 1 — round-1 helper whitelist omitted import_map_suffix (regression, fixed). The round-1 cbm_perl_suppress_generic_match kept only same_module/import_map, but resolve_import_map also returns import_map_suffix (confidence 0.85 — a genuine import resolution, above weak unique_name/suffix_match). A ::-qualified Perl builtin/method call resolved via the import-suffix fallback was therefore wrongly dropped, partially missing criterion (d). Fixed: import_map_suffix added to the kept set; doc comment corrected; unit test added asserting it is retained. Only the weak short-name strategies (suffix_match/unique_name) are now suppressed.

[claude] Finding 2 — leading-:: (main:: shorthand) under-extraction (minor, hypothetical, advisory — deferred). perl_is_identifier_callee rejects a callee beginning with :: (e.g. ::foo == main::foo) because the first char must be a letter/_. This can only ever miss an edge, never create a false one, and it's unverified whether tree-sitter-perl surfaces a leading-:: callee token (the grammar is compiled). Not a noise/over-extraction criterion. Left as a tracked advisory rather than adding speculative code.

[claude] Finding 3 — colon-edge-case test gap (minor, advisory — deferred). perl_is_identifier_callee is a static helper exercised only indirectly; there's no focused test feeding :::, trailing ::, Foo::Bar::baz, or SUPER::method. The reviewer traced each case and confirmed the logic is correct (accepts qualified/_-leading names; rejects lone :/:::/trailing ::; no read-past-terminator). Coverage debt, not a defect — tracked as advisory.

[claude] Finding 4 — pre-existing unrelated test failure. cli_hook_gate_script_no_predictable_tmp_issue384 (tests/test_cli.c) fails reading a hook-gate file under a sandboxed /tmp dir. The PR touches no cli/hook files; confirmed failing on clean upstream/main as well — environmental, out of scope for this PR.

Result

1 confirmed regression fixed (fix(perl): address QA round 2); 2 minor/hypothetical items deferred as advisory; the 1 suite failure is pre-existing and unrelated. Build clean (-Werror), clang-format clean, suite green (5611 passed; only the pre-existing issue384). With (d) now passing, all contract criteria are met.

QA performed by Claude Code (claude-opus-4-8) parallel panel (DO DeepSeek second opinion timed out — non-blocking)

halindrome · 2026-06-16T15:11:26Z

QA Round 3 (confirming) — CLEAN ✅

Reviewer: Claude Code (claude-opus-4-8) parallel review panel (3 lenses). All three lenses returned empty findings. (The DigitalOcean DeepSeek second opinion did not complete within the window again this round — recorded as a non-blocking second-opinion failure; a DO opinion was captured in round 1.)

Contract Verification (issue #476) — all pass

Criterion	Verdict
(a) builtin call → no false edge to same-named sub	pass
(b) config-string token never extracted as callee	pass
(c) arrow/method calls flagged + generic match suppressed	pass
(d) genuine same-file/imported calls still resolve	pass
(e) suppression Perl-gated; other languages byte-identical	pass
(f) both sequential + parallel resolvers apply it consistently, after `cbm_registry_resolve`	pass

Verification highlights

Round-2 fix confirmed complete. The panel enumerated every strategy literal cbm_registry_resolve can emit — {import_map, import_map_suffix, same_module} (kept) and {suffix_match, unique_name} (dropped) — and confirmed the keep/drop partition is exhaustive. The fuzzy strategy comes only from a separate resolver never fed to the helper (not a gap).
perl_is_identifier_callee re-audited for read-past-terminator on the :: lookahead: p[2] is read only when p[1]==':' (worst case NUL) — no OOB read.
Perl-gating verified airtight: CBMCall.is_method is zero-initialized at both construction sites, so non-Perl behavior is byte-identical; [CALLS-BREADTH] 53 langs: 0 FAILURES.
Schema: none. SAST: code scanning not enabled (skipped).

Result

No new or remaining confirmed defects. The two prior advisory items (hypothetical leading-:: under-extraction; colon-edge-case test gap — logic verified correct) remain non-blocking and were not re-raised. The single suite failure (cli_hook_gate_script_no_predictable_tmp_issue384) is pre-existing on upstream/main and out of scope.

3 QA rounds complete; round 3 clean. Marking ready for review.

QA performed by Claude Code (claude-opus-4-8) parallel panel

shanemccarron-maker and others added 5 commits June 16, 2026 08:17

halindrome marked this pull request as ready for review June 16, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perl): eliminate false-positive Perl CALLS edges (builtins, framework method calls, config strings)#477

fix(perl): eliminate false-positive Perl CALLS edges (builtins, framework method calls, config strings)#477
halindrome wants to merge 6 commits into
DeusData:mainfrom
halindrome:perl-call-graph-noise

halindrome commented Jun 16, 2026

Uh oh!

halindrome commented Jun 16, 2026

Uh oh!

halindrome commented Jun 16, 2026

Uh oh!

halindrome commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

halindrome commented Jun 16, 2026

Summary

What changed

Validation

Uh oh!

halindrome commented Jun 16, 2026

QA Round 1

Contract Verification (issue #476)

Findings

Result

Uh oh!

halindrome commented Jun 16, 2026

QA Round 2

Contract Verification (issue #476)

Findings

Result

Uh oh!

halindrome commented Jun 16, 2026

QA Round 3 (confirming) — CLEAN ✅

Contract Verification (issue #476) — all pass

Verification highlights

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants