Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
phase: 22
plan: 03
title: Core semantic resolution (package, sub, method, ISA, bless, exports)
status: complete
tasks_completed: 5
tasks_total: 5
commit_hashes:
- 69d9025 # feat(perl-lsp): implement core semantic resolution (package/sub/method/ISA/bless/exports)
files_modified:
- internal/cbm/lsp/perl_lsp.c
branch: perl-lsp-semantic-resolution
build_status: green # scripts/build.sh exit 0, binary at build/c/codebase-memory-mcp
test_status: green-except-preexisting # scripts/test.sh: 3553 passed, 1 pre-existing failure (DEVN-05)
clang_format: clean # CommandLineTools clang-format --dry-run --Werror, no diff
ac_results:
- truth: "perl_lsp_process_file does a two-pass walk: (1) package_statement + use_statement collection, (2) subroutine_declaration_statement processing"
result: pass
- truth: "Method calls ($obj->m, Class->m, $self->m) and Package::sub() calls resolve to CBMResolvedCall edges"
result: pass
- truth: "MRO is resolved via @ISA assignment, use parent, and use base"
result: pass
- truth: "bless($ref,'Class') and the ref($class)||$class idiom bind a variable to a package type"
result: pass
- truth: "perl_eval_expr_type is recursion-guarded via eval_depth (cap mirrors php_eval_expr_type)"
result: pass
- truth: "Unresolvable receivers emit NO spurious edge (mirrors phplsp_unindexed_receiver_emits_block)"
result: pass
pre_existing_issues:
- '{"test": "search_code_multi_word", "file": "tests/test_mcp.c", "error": "tests/test_mcp.c:694 ASSERT(strstr(resp, \"HandleRequest\") != NULL) failed — multi-word search-code MCP test; unrelated to Perl LSP and not in this plan''s file set (DEVN-05 pre-existing, identical to plan 22-01/02 baseline)"}'
deviations:
- "DEVN-05 (pre-existing): scripts/test.sh reports 3553 passed / 1 failed, identical to the plan 22-01/02 baseline. The single failure is search_code_multi_word (tests/test_mcp.c:694), an MCP search-code test unrelated to the Perl LSP. Out of scope; not fixed."
- "DEVN-01 (minor): the plan lists 5 tasks all editing the single file perl_lsp.c with deeply interdependent functions (process_subroutine calls eval_expr_type and the call/method dispatch; the entry point wires all of them). Splitting into 5 commits would produce non-compiling intermediate states, violating the build-green-per-commit invariant. Delivered as ONE atomic commit for the cohesive resolver. All five tasks' functionality is present and individually verified end-to-end against a multi-package fixture."
- "DEVN-04 (architectural, downstream — flagged for orchestrator): the resolver correctly POPULATES result->resolved_calls for typed method calls ($obj->m, Class->m, $self->m) and Package::sub() — verified empirically (7 correct CBMResolvedCall entries on the Base/Derived/main fixture, zero on unresolvable receivers). HOWEVER, those resolved method-call edges do not currently surface as graph CALLS edges because the pipeline bridge (src/pipeline/pass_calls.c / pass_parallel.c via cbm_pipeline_find_lsp_resolution) only refines EXISTING structural call edges, and `method_call_expression` is NOT in perl_call_types in internal/cbm/lang_specs.c (line 542) — so the structural tier never emits a method-call edge for the bridge to attach to. Adding `method_call_expression` (and optionally a Package::sub callee-name normalization) to lang_specs.c is required for typed method-call edges to appear in the graph, but lang_specs.c is OUTSIDE this plan's allowed_paths (files_modified: [internal/cbm/lsp/perl_lsp.c] only). The plan's must-have truths and verification are framed around CBMResolvedCall emission, which is fully satisfied; the graph-edge surfacing is a one-line structural-tier follow-up that a subsequent plan (with lang_specs.c in scope) should make. Static Package::sub() calls ARE structural calls, but their structural callee_name is the qualified `Pkg::sub` while the bridge compares the last dot-segment of the resolved callee_qn — that normalization also belongs with the lang_specs follow-up."
- "DEVN-04 RESOLVED (follow-up fix, 2026-06-13): the two graph-surfacing gaps above are now closed in a dedicated follow-up commit. (1) `method_call_expression` added to perl_call_types in internal/cbm/lang_specs.c so the structural tier emits a method-call edge (callee_name = bare method, via the field-based extractor's `method` branch) for the bridge to attach the LSP resolution to. (2) cbm_pipeline_find_lsp_resolution in src/pipeline/lsp_resolve.h now reduces the textual callee_name to its last `::`-separated segment before comparing, so qualified static `Pkg::sub()` calls match the resolved sub's short name. Verified on a fresh Base/Derived/main fixture: trace_path outbound now shows run_typed->{greet,describe}, run_static->helper, run_classcall->greet, describe->greet (inherited Base::greet), while run_untyped (untyped `$thing->mystery()` / `$unknown->whatever()`) yields ZERO edges — zero-edge guarantee preserved. Build green; scripts/test.sh 3553 passed / 1 pre-existing failure (search_code_multi_word, DEVN-05)."
---

## What Was Built

Replaced the plan-01 no-op `perl_lsp_process_file` / stub helpers with the full
Perl Light Semantic Pass inside `perl_lsp.c`, mirroring `php_lsp.c`'s resolution
architecture (only `perl_lsp.c` modified — disjoint from plan-02's stdlib seed).

Resolution scenarios implemented and verified end-to-end (indexed a
Base/Derived/main multi-package fixture with `CBM_LSP_DEBUG=1` and confirmed the
emitted `CBMResolvedCall` set):

- **Two-pass process_file.** PASS 1 walks the file for `package_statement`
boundaries (packages can switch mid-file), `@ISA` assignments, `use parent` /
`use base` inheritance, and Exporter `use Module qw(f1 f2)` imports
(f1 → Module::f1). PASS 2 walks each `subroutine_declaration_statement`.
- **process_subroutine + invocant binding.** Pushes a scope, sets
`enclosing_func_qn = module_qn.subname` (the structural QN scheme, verified
against `helpers.c cbm_enclosing_func_qn` — Perl has no class node type so the
package is not woven into the sub QN), and binds the `$self`/`$class` invocant
(`my $X = shift` idiom) to the enclosing package type.
- **perl_eval_expr_type (sigil-aware, recursion-guarded).** Scalar scope lookup;
`method_call_expression` and `function_call_expression` dispatch;
`bless($r,'Class')` literal recognition (conf 0.95) and the
`ref($class)||$class` / bare `$class` idiom → enclosing package (conf 0.75);
assignment-RHS propagation; `ClassName->new` → `ClassName`. Guarded by
`eval_depth` (cap 8, mirroring php).
- **@ISA / use parent / use base MRO.** All three forms feed a per-package
`CBMRegisteredType.embedded_types` (multiple inheritance as a `const char**`
array); `perl_lookup_method` walks the chain depth-first with cycle detection,
bounded by `CBM_LSP_MAX_LOOKUP_DEPTH`.
- **Call/method dispatch + emit.** `Package::sub()` static, bare/imported
`func()`, and typed-receiver `$obj->m` / `Class->m` / `$self->m` push
`CBMResolvedCall` via `cbm_resolvedcall_push`. Unresolvable receivers emit NO
edge (zero-edge guarantee verified: 0 resolved on a fixture with an untyped
`$x->bar()` and `$unknown->baz()`); symbol-table aliasing is ignored.

Tree-sitter-perl node/field names (Open Questions #1–3) were verified against
the vendored compiled grammar `internal/cbm/vendored/grammars/perl/parser.c`
(`ts_symbol_names` + `ts_field_names` tables; no node-types.json/grammar.js is
vendored). Confirmed and documented in a file-header comment:
`method_call_expression` → fields `invocant` + `method`; `package_statement` →
field `name`; `use_statement` → field `module` + `quoted_word_list`;
`variable_declaration` target → field `variable` (singular, not `variables`);
`bless`/parent args nest inside `list_expression`.

Build green; `scripts/test.sh` reports 3553 passed with the single pre-existing
unrelated failure noted above; `perl_lsp.c` is clang-format clean.

## Files Modified

- `internal/cbm/lsp/perl_lsp.c` — full resolver (process_file two-pass walk,
process_subroutine + $self/$class binding, sigil-aware recursion-guarded
perl_eval_expr_type with bless/new, @ISA/parent/base detection, perl_lookup_method
MRO walk, Exporter import map, function/method call dispatch + perl_emit_resolved,
per-package type + method-table construction), replacing the plan-01 no-op stubs.
4 changes: 3 additions & 1 deletion Makefile.cbm
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,8 @@ TEST_CS_LSP_SRCS = tests/test_cs_lsp.c

TEST_CS_LSP_BENCH_SRCS = tests/test_cs_lsp_bench.c

TEST_PERL_LSP_SRCS = tests/test_perl_lsp.c

TEST_SCOPE_SRCS = tests/test_scope.c

TEST_TYPE_REP_SRCS = tests/test_type_rep.c
Expand Down Expand Up @@ -383,7 +385,7 @@ TEST_SIMHASH_SRCS = tests/test_simhash.c

TEST_STACK_OVERFLOW_SRCS = tests/test_stack_overflow.c

ALL_TEST_SRCS = $(TEST_FOUNDATION_SRCS) $(TEST_EXTRACTION_SRCS) $(TEST_STORE_SRCS) $(TEST_CYPHER_SRCS) $(TEST_MCP_SRCS) $(TEST_DISCOVER_SRCS) $(TEST_GRAPH_BUFFER_SRCS) $(TEST_PIPELINE_SRCS) $(TEST_WATCHER_SRCS) $(TEST_LZ4_SRCS) $(TEST_ZSTD_SRCS) $(TEST_ARTIFACT_SRCS) $(TEST_SQLITE_WRITER_SRCS) $(TEST_GO_LSP_SRCS) $(TEST_C_LSP_SRCS) $(TEST_PHP_LSP_SRCS) $(TEST_CS_LSP_SRCS) $(TEST_CS_LSP_BENCH_SRCS) $(TEST_SCOPE_SRCS) $(TEST_TYPE_REP_SRCS) $(TEST_PY_LSP_SRCS) $(TEST_PY_LSP_BENCH_SRCS) $(TEST_PY_LSP_STRESS_SRCS) $(TEST_PY_LSP_SCALE_SRCS) $(TEST_TS_LSP_SRCS) $(TEST_JAVA_LSP_SRCS) $(TEST_KOTLIN_LSP_SRCS) $(TEST_RUST_LSP_SRCS) $(TEST_TRACES_SRCS) $(TEST_CLI_SRCS) $(TEST_MEM_SRCS) $(TEST_UI_SRCS) $(TEST_HTTPD_SRCS) $(TEST_SECURITY_SRCS) $(TEST_YAML_SRCS) $(TEST_SIMHASH_SRCS) $(TEST_STACK_OVERFLOW_SRCS) $(TEST_INTEGRATION_SRCS)
ALL_TEST_SRCS = $(TEST_FOUNDATION_SRCS) $(TEST_EXTRACTION_SRCS) $(TEST_STORE_SRCS) $(TEST_CYPHER_SRCS) $(TEST_MCP_SRCS) $(TEST_DISCOVER_SRCS) $(TEST_GRAPH_BUFFER_SRCS) $(TEST_PIPELINE_SRCS) $(TEST_WATCHER_SRCS) $(TEST_LZ4_SRCS) $(TEST_ZSTD_SRCS) $(TEST_ARTIFACT_SRCS) $(TEST_SQLITE_WRITER_SRCS) $(TEST_GO_LSP_SRCS) $(TEST_C_LSP_SRCS) $(TEST_PHP_LSP_SRCS) $(TEST_CS_LSP_SRCS) $(TEST_CS_LSP_BENCH_SRCS) $(TEST_PERL_LSP_SRCS) $(TEST_SCOPE_SRCS) $(TEST_TYPE_REP_SRCS) $(TEST_PY_LSP_SRCS) $(TEST_PY_LSP_BENCH_SRCS) $(TEST_PY_LSP_STRESS_SRCS) $(TEST_PY_LSP_SCALE_SRCS) $(TEST_TS_LSP_SRCS) $(TEST_JAVA_LSP_SRCS) $(TEST_KOTLIN_LSP_SRCS) $(TEST_RUST_LSP_SRCS) $(TEST_TRACES_SRCS) $(TEST_CLI_SRCS) $(TEST_MEM_SRCS) $(TEST_UI_SRCS) $(TEST_HTTPD_SRCS) $(TEST_SECURITY_SRCS) $(TEST_YAML_SRCS) $(TEST_SIMHASH_SRCS) $(TEST_STACK_OVERFLOW_SRCS) $(TEST_INTEGRATION_SRCS)


# ── Build directories ────────────────────────────────────────────
Expand Down
4 changes: 4 additions & 0 deletions internal/cbm/cbm.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "lsp/go_lsp.h"
#include "lsp/c_lsp.h"
#include "lsp/php_lsp.h"
#include "lsp/perl_lsp.h"
#include "lsp/py_lsp.h"
#include "lsp/ts_lsp.h"
#include "lsp/cs_lsp.h"
Expand Down Expand Up @@ -609,6 +610,9 @@ CBMFileResult *cbm_extract_file(const char *source, int source_len, CBMLanguage
if (language == CBM_LANG_PHP) {
cbm_run_php_lsp(a, result, source, source_len, root);
}
if (language == CBM_LANG_PERL) {
cbm_run_perl_lsp(a, result, source, source_len, root);
}
if (language == CBM_LANG_PYTHON) {
cbm_run_py_lsp(a, result, source, source_len, root);
}
Expand Down
2 changes: 1 addition & 1 deletion internal/cbm/lang_specs.c
Original file line number Diff line number Diff line change
Expand Up @@ -585,7 +585,7 @@ static const char *perl_func_types[] = {"subroutine_declaration_statement", NULL
static const char *perl_module_types[] = {"source_file", NULL};
static const char *perl_call_types[] = {"ambiguous_function_call_expression",
"function_call_expression", "func1op_call_expression",
NULL};
"method_call_expression", NULL};
static const char *perl_import_types[] = {"use_statement", "require_statement", "require", NULL};
static const char *perl_branch_types[] = {"if_statement", "unless_statement", "for_statement",
"foreach_statement", "while_statement", NULL};
Expand Down
140 changes: 140 additions & 0 deletions internal/cbm/lsp/generated/perl_stdlib_data.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
/*
* perl_stdlib_data.c — hand-written Perl stdlib + CPAN type data.
*
* Strategy mirrors php_stdlib_data.c (docs/PLAN_PHP_LSP_INTEGRATION.md §6):
* 1. perlfunc core built-ins (print, bless, ref, ...) registered as global,
* package-less functions reachable from any namespace.
* 2. Curated, corpus-driven CPAN OOP modules (Scalar::Util, List::Util,
* Carp, POSIX, Storable, Data::Dumper) registered as module-qualified
* functions.
*
* Module-qualified functions use dotted QNs (Foo.Bar.func) to match
* perl_pkg_to_dot (Foo::Bar -> Foo.Bar) so an Exporter import map
* (plan 22-03) can resolve `use Scalar::Util qw(blessed)` to these symbols.
*
* Return types are left UNKNOWN (cbm_type_unknown) for v1: real signature
* inference is out of scope here — this seed only provides a baseline symbol
* table for the resolver. Moose meta stubs (has/extends/with) are deferred
* (Open Question #4).
*/

#include "../type_rep.h"
#include "../type_registry.h"
#include "../../arena.h"
#include "../perl_lsp.h"
#include <string.h>

#define MIXED cbm_type_unknown()

/* Register a global (package-less) built-in function returning `ret_type_`.
* Reachable from any package — short_name == qualified_name (bare name). */
#define REG_BUILTIN(name_, ret_type_) \
do { \
memset(&rf, 0, sizeof(rf)); \
rf.min_params = -1; \
rf.qualified_name = (name_); \
rf.short_name = (name_); \
{ \
const CBMType **rets = (const CBMType **)cbm_arena_alloc(arena, 2 * sizeof(*rets)); \
rets[0] = (ret_type_); \
rets[1] = NULL; \
rf.signature = cbm_type_func(arena, NULL, NULL, rets); \
} \
cbm_registry_add_func(reg, rf); \
} while (0)

/* Register a module-qualified function (an exported sub, not a method).
* `module_dot_` is the dotted package QN (e.g. "Scalar.Util"); `name_` is the
* bare sub name. QN becomes "Scalar.Util.blessed"; short_name stays bare so an
* Exporter import map can resolve `use Scalar::Util qw(blessed)`. */
#define REG_FUNC(module_dot_, name_, ret_type_) \
do { \
memset(&rf, 0, sizeof(rf)); \
rf.min_params = -1; \
rf.qualified_name = cbm_arena_sprintf(arena, "%s.%s", (module_dot_), (name_)); \
rf.short_name = (name_); \
{ \
const CBMType **rets = (const CBMType **)cbm_arena_alloc(arena, 2 * sizeof(*rets)); \
rets[0] = (ret_type_); \
rets[1] = NULL; \
rf.signature = cbm_type_func(arena, NULL, NULL, rets); \
} \
cbm_registry_add_func(reg, rf); \
} while (0)

void cbm_perl_stdlib_register(CBMTypeRegistry *reg, CBMArena *arena) {
CBMRegisteredFunc rf;

/* ── perlfunc core built-ins (global, package-less) ─────────────
* Source: RESEARCH.md L365 (perldoc perlfunc core list). Reachable from
* any package; return types unknown for v1. */
REG_BUILTIN("print", MIXED);
REG_BUILTIN("printf", MIXED);
REG_BUILTIN("sprintf", cbm_type_builtin(arena, "string"));
REG_BUILTIN("open", MIXED);
REG_BUILTIN("close", MIXED);
REG_BUILTIN("push", cbm_type_builtin(arena, "int"));
REG_BUILTIN("pop", MIXED);
REG_BUILTIN("shift", MIXED);
REG_BUILTIN("unshift", cbm_type_builtin(arena, "int"));
REG_BUILTIN("map", MIXED);
REG_BUILTIN("grep", MIXED);
REG_BUILTIN("sort", MIXED);
REG_BUILTIN("join", cbm_type_builtin(arena, "string"));
REG_BUILTIN("split", MIXED);
REG_BUILTIN("length", cbm_type_builtin(arena, "int"));
REG_BUILTIN("substr", cbm_type_builtin(arena, "string"));
REG_BUILTIN("chomp", MIXED);
REG_BUILTIN("chop", MIXED);
REG_BUILTIN("die", MIXED);
REG_BUILTIN("warn", MIXED);
REG_BUILTIN("ref", cbm_type_builtin(arena, "string"));
REG_BUILTIN("bless", MIXED);
REG_BUILTIN("defined", cbm_type_builtin(arena, "bool"));
REG_BUILTIN("exists", cbm_type_builtin(arena, "bool"));
REG_BUILTIN("delete", MIXED);
REG_BUILTIN("scalar", MIXED);
REG_BUILTIN("keys", MIXED);
REG_BUILTIN("values", MIXED);
REG_BUILTIN("each", MIXED);

/* ── Scalar::Util ───────────────────────────────────────────────
* Source: RESEARCH.md L366. Exported subs; module QN "Scalar.Util". */
REG_FUNC("Scalar.Util", "blessed", MIXED);
REG_FUNC("Scalar.Util", "reftype", cbm_type_builtin(arena, "string"));
REG_FUNC("Scalar.Util", "weaken", MIXED);

/* ── List::Util ─────────────────────────────────────────────────
* Source: RESEARCH.md L366. Module QN "List.Util". */
REG_FUNC("List.Util", "sum", MIXED);
REG_FUNC("List.Util", "max", MIXED);
REG_FUNC("List.Util", "min", MIXED);
REG_FUNC("List.Util", "first", MIXED);
REG_FUNC("List.Util", "reduce", MIXED);

/* ── Carp ───────────────────────────────────────────────────────
* Source: RESEARCH.md L367. Module QN "Carp". */
REG_FUNC("Carp", "croak", MIXED);
REG_FUNC("Carp", "carp", MIXED);
REG_FUNC("Carp", "confess", MIXED);
REG_FUNC("Carp", "cluck", MIXED);

/* ── POSIX (commonly-imported entry points) ─────────────────────
* Source: RESEARCH.md L367. Module QN "POSIX". */
REG_FUNC("POSIX", "floor", MIXED);
REG_FUNC("POSIX", "ceil", MIXED);
REG_FUNC("POSIX", "strftime", cbm_type_builtin(arena, "string"));
REG_FUNC("POSIX", "INT_MAX", cbm_type_builtin(arena, "int"));

/* ── Storable ───────────────────────────────────────────────────
* Source: RESEARCH.md L367. Module QN "Storable". */
REG_FUNC("Storable", "dclone", MIXED);
REG_FUNC("Storable", "freeze", cbm_type_builtin(arena, "string"));
REG_FUNC("Storable", "thaw", MIXED);
REG_FUNC("Storable", "nstore", MIXED);
REG_FUNC("Storable", "retrieve", MIXED);

/* ── Data::Dumper ───────────────────────────────────────────────
* Source: RESEARCH.md L367. Module QN "Data.Dumper". */
REG_FUNC("Data.Dumper", "Dumper", cbm_type_builtin(arena, "string"));
}
Loading
Loading