GLR ambiguity-merge is O(n²) for deeply-nested ambiguous grammars (e.g. Perl), even with the recursion-depth cap

## Summary

tree-sitter's GLR ambiguity-merge (`stack_node_add_link` in `internal/cbm/vendored/ts_runtime/src/stack.c`) exhibits **O(n²)-or-worse** parse time on deeply-nested, grammar-*ambiguous* input — e.g. a Perl `f(f(f(...)))` chain, where the grammar is ambiguous for paren-optional function calls (`ambiguous_function_call_expression`).

This is distinct from the stack-overflow crash fixed in #461: the `CBM_TS_STACK_MERGE_MAX_DEPTH` cap bounds recursion *depth* (preventing SIGSEGV), but does **not** bound the total merge *work*, so parse time still grows superlinearly with nesting depth.

## Evidence (from the #461 effort)

- **Pre-cap:** a 30k-deep ambiguous Perl chain SIGSEGV'd (stack overflow) — even on an 8 MB stack at extreme depth.
- **Post-cap:** no crash, but extracting the same 30k-deep input takes **> 5 minutes** (would time out CI). The same input at depth 2000 completes in **< 1 s** — confirming superlinear growth, not the cap.
- Unambiguous grammars (Java/C++) parse the identical `f(f(...))` depth quickly because they trigger no GLR ambiguity merging.

## Impact

- Pathological/adversarial deeply-nested ambiguous source can make indexing of a single file extremely slow (effectively a per-file DoS via unbounded work). Real-world code rarely nests this deeply, so normal-repo impact is low.
- It forced the #461 regression test (`lsp_perl_deep_expression_no_crash`) to cap depth at 2000, which exercises the cap's code path but no longer reproduces a true crash-without-cap on an 8 MB stack.

## Suggested direction

Bound the **total** merge work, not just recursion depth — e.g. an overall ambiguity-merge budget / iteration cap with graceful degradation consistent with the existing `MAX_LINK_COUNT` bail-out, or memoize/short-circuit redundant merges. Goal: keep parse time near-linear on pathological ambiguous input while still producing a valid parse tree.

## Refs

Surfaced during the Perl LSP work in #461 (the `ts-runtime` GLR merge-depth cap commit and the `lsp_perl_deep_expression_no_crash` test).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLR ambiguity-merge is O(n²) for deeply-nested ambiguous grammars (e.g. Perl), even with the recursion-depth cap #471

Summary

Evidence (from the #461 effort)

Impact

Suggested direction

Refs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GLR ambiguity-merge is O(n²) for deeply-nested ambiguous grammars (e.g. Perl), even with the recursion-depth cap #471

Description

Summary

Evidence (from the #461 effort)

Impact

Suggested direction

Refs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions