Skip to content

fix(java-extractor): strip inner call args from chained method-call callee#443

Open
tirth8205 wants to merge 2 commits into
Egonex-AI:mainfrom
tirth8205:fix/java-extractor-chained-call
Open

fix(java-extractor): strip inner call args from chained method-call callee#443
tirth8205 wants to merge 2 commits into
Egonex-AI:mainfrom
tirth8205:fix/java-extractor-chained-call

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Problem

JavaExtractor.extractMethodInvocationName builds the callee for a qualified call as ${objectNode.text}.${nameNode.text}. When the receiver (the object field) is itself a method_invocation — i.e. a chained / fluent / builder-style call such as repository.findAll().forEach(handler) or builder().build()objectNode.text is the full source text of the inner call including its parentheses and arguments.

The result is a malformed callee string that embeds () and argument text, e.g.:

  • builder().build() -> callee "builder().build"
  • repository.findAll().forEach(handler) -> callee "repository.findAll().forEach"

A call-graph callee should be a method name (optionally qualified by a simple receiver), never a string containing (). This pollutes the call graph with un-resolvable, parenthesis-laden callee names for any fluent/builder/stream-style Java code. The existing qualified-call test only passes because System.out.println has a field_access object (System.out), not a method_invocation object.

Fix

Guard the qualified-call branch so it only prefixes the receiver when the object is not itself a method_invocation:

const objectNode = node.childForFieldName("object");
if (objectNode && objectNode.type !== "method_invocation") {
  return `${objectNode.text}.${nameNode.text}`;
}
return nameNode.text;

This keeps System.out.println (object type field_access) and plain calls unchanged, and turns builder().build() into callee build (the inner builder() call still emits its own well-formed entry).

Testing

  • Added a test under extractCallGraph that parses builder().build(); and asserts a callee "build" exists and that no callee contains "()". This test fails before the fix (the outer call's callee is the malformed "builder().build") and passes after.
  • The full core Vitest suite is green (693 tests), confirming no regressions to existing extractors.
  • tsc --noEmit on packages/core exits 0 and ESLint on both changed files is clean.

🤖 Generated with Claude Code

…allee

For a chained call like `builder().build()`, the method_invocation's
`object` field is itself a method_invocation node, so building the callee
as `${objectNode.text}.${nameNode.text}` produced the malformed callee
"builder().build" (parentheses and inner args embedded in the name).

Guard the qualified-call branch so it only prefixes the receiver when the
object is not itself a method_invocation. Chained calls now yield a clean
method name ("build") for the outer call plus the existing well-formed
entry for the inner call ("builder"). Simple receivers such as
`System.out.println` (object type field_access) are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thejesh23 thejesh23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Other receiver node types still leak () into the callee.
The guard only excludes method_invocation, but new Foo().bar() (object type object_creation_expression), ((Foo) x).bar() (parenthesized_expression), and (Foo) x.bar() (cast_expression) all have .text containing ()/cast tokens. The PR's own invariant callee.includes("()") === false will be violated by any of these. Consider an allowlist of "simple" receiver types (identifier / field_access / scoped_identifier / this / super) instead of a single-type denylist.

2. super.foo() and explicit-type-args calls aren't covered by tests.
super.foo() (object type super) and obj.<T>foo() / Foo.<T>bar() (type_arguments between object and name) are common in Java but untested here; please add at least a super.bar() assertion so a future grammar/refactor doesn't silently break qualified-super callees.

3. Dropping the receiver hurts call-graph disambiguation.
Replacing builder().build with bare build collapses every fluent terminal into one node, losing the receiver type entirely; downstream consumers can no longer distinguish a.build() vs b.build(). A receiver-typed form like <chain>.build or recursing into the inner method_invocation to extract its name (e.g. builder.build) would preserve more signal. Worth noting since #435's Dart extractor will hit the same chained/cascade shape.

Nit: the new test only asserts presence of "build" and absence of "()"; an explicit assertion that the inner "builder" callee is also emitted would lock in the "inner call still emits its own entry" claim from the PR description.

The chained-call fix used a single-type denylist (`!== method_invocation`),
which still leaked `()`/cast tokens into the callee for other complex
receivers: `new Foo().bar()` (object_creation_expression) and
`((Foo) x).bar()` (parenthesized_expression) both took the qualified branch
and produced callees containing `()`, violating the extractor's own
"callee is a clean identifier" invariant.

Replace the denylist with an allowlist of simple receiver types
(identifier / field_access / scoped_identifier / this / super). Only those
get the `<receiver>.<name>` qualified form; everything else collapses to the
bare method name. Document the intentional receiver-drop and a TODO for
receiver-typed callees / call-graph disambiguation (Egonex-AI#435).

Tests: lock the inner `builder` entry of a chain, no-`()` for
object-creation and parenthesized/cast receivers, and preserved qualified
callees for `super.bar()` / `this.bar()`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tirth8205

Copy link
Copy Markdown
Contributor Author

Addressed all four points; pushed as 78a3a8f.

1. Other receiver types leaking ()/cast tokens — fixed. Confirmed against tree-sitter-java that new Foo().bar() has an object_creation_expression receiver and ((Foo) x).bar() a parenthesized_expression receiver, both of which took the old !== method_invocation denylist branch and produced callees containing (). Replaced the denylist with the allowlist you suggested — extractMethodInvocationName now only emits <receiver>.<name> when the object node type is one of identifier / field_access / scoped_identifier / this / super; everything else falls through to the bare method name. Added tests asserting no () (and no cast/paren tokens) for the object-creation and parenthesized/cast cases.

Minor note: (Foo) x.bar() parses as (Foo)(x.bar()) (cast wraps the invocation, whose object is the identifier x), so it was already clean and isn't a separate receiver case — the real offenders were object-creation and parenthesized.

2. super.foo() / explicit type-args — tested. Added super.bar() (asserts super.bar callee preserved) and this.bar() assertions, which also guard the allowlist refactor against regressing those. Verified the obj.<String>foo() shape: the type_arguments node sits between object and name, the object is a plain identifier, so the existing name-field extraction already yields a clean obj.foo — no code change needed there.

3. Receiver drop / call-graph disambiguation. Left as a deliberate tradeoff for this PR and documented it with a TODO in extractMethodInvocationName referencing call-graph disambiguation and #435. The extractor does no receiver-type resolution anywhere; reconstructing a typed receiver (<chain>.build) or recursing into the inner method_invocation to synthesize builder.build is a real feature with grammar-shape assumptions and design implications shared with the Dart cascade work, so it's better scoped separately. Bare build is consistent with the existing "method name only" behavior and strictly better than the malformed builder().build.

Nit. The chained-call test now also asserts result.some((e) => e.callee === "builder"), locking in the "inner call still emits its own entry" claim.

Core suite green: 697 passed; tsc --noEmit clean.

@Lum1104

Lum1104 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@codex review this

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78a3a8fd66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

*/
private static readonly SIMPLE_RECEIVER_TYPES = new Set([
"identifier",
"field_access",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject complex field_access receivers

For chained calls that go through a field before the final method, e.g. builder().result.build() or ((Bar) x).result.build(), the final invocation's object is a field_access whose text still includes the complex receiver (builder().result / ((Bar) x).result). Because field_access is allowlisted here, extractMethodInvocationName still returns callees containing () or cast tokens, so the new cleanup only works for direct builder().build() but not common field-in-chain variants.

Useful? React with 👍 / 👎.

@thejesh23

Copy link
Copy Markdown
Contributor

@tirth8205 Overall PR looks good to me after recent commit , I have just local test on a small JAVA project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants