You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tree-sitter's GLR ambiguity-merge (stack_node_add_link in internal/cbm/vendored/ts_runtime/src/stack.c) exhibits O(n²)-or-worse parse time on deeply-nested, grammar-ambiguous input — e.g. a Perl f(f(f(...))) chain, where the grammar is ambiguous for paren-optional function calls (ambiguous_function_call_expression).
This is distinct from the stack-overflow crash fixed in #461: the CBM_TS_STACK_MERGE_MAX_DEPTH cap bounds recursion depth (preventing SIGSEGV), but does not bound the total merge work, so parse time still grows superlinearly with nesting depth.
Pre-cap: a 30k-deep ambiguous Perl chain SIGSEGV'd (stack overflow) — even on an 8 MB stack at extreme depth.
Post-cap: no crash, but extracting the same 30k-deep input takes > 5 minutes (would time out CI). The same input at depth 2000 completes in < 1 s — confirming superlinear growth, not the cap.
Unambiguous grammars (Java/C++) parse the identical f(f(...)) depth quickly because they trigger no GLR ambiguity merging.
Impact
Pathological/adversarial deeply-nested ambiguous source can make indexing of a single file extremely slow (effectively a per-file DoS via unbounded work). Real-world code rarely nests this deeply, so normal-repo impact is low.
Bound the total merge work, not just recursion depth — e.g. an overall ambiguity-merge budget / iteration cap with graceful degradation consistent with the existing MAX_LINK_COUNT bail-out, or memoize/short-circuit redundant merges. Goal: keep parse time near-linear on pathological ambiguous input while still producing a valid parse tree.
Refs
Surfaced during the Perl LSP work in #461 (the ts-runtime GLR merge-depth cap commit and the lsp_perl_deep_expression_no_crash test).
Summary
tree-sitter's GLR ambiguity-merge (
stack_node_add_linkininternal/cbm/vendored/ts_runtime/src/stack.c) exhibits O(n²)-or-worse parse time on deeply-nested, grammar-ambiguous input — e.g. a Perlf(f(f(...)))chain, where the grammar is ambiguous for paren-optional function calls (ambiguous_function_call_expression).This is distinct from the stack-overflow crash fixed in #461: the
CBM_TS_STACK_MERGE_MAX_DEPTHcap bounds recursion depth (preventing SIGSEGV), but does not bound the total merge work, so parse time still grows superlinearly with nesting depth.Evidence (from the #461 effort)
f(f(...))depth quickly because they trigger no GLR ambiguity merging.Impact
lsp_perl_deep_expression_no_crash) to cap depth at 2000, which exercises the cap's code path but no longer reproduces a true crash-without-cap on an 8 MB stack.Suggested direction
Bound the total merge work, not just recursion depth — e.g. an overall ambiguity-merge budget / iteration cap with graceful degradation consistent with the existing
MAX_LINK_COUNTbail-out, or memoize/short-circuit redundant merges. Goal: keep parse time near-linear on pathological ambiguous input while still producing a valid parse tree.Refs
Surfaced during the Perl LSP work in #461 (the
ts-runtimeGLR merge-depth cap commit and thelsp_perl_deep_expression_no_crashtest).