Skip to content

GLR ambiguity-merge is O(n²) for deeply-nested ambiguous grammars (e.g. Perl), even with the recursion-depth cap #471

@halindrome

Description

@halindrome

Summary

tree-sitter's GLR ambiguity-merge (stack_node_add_link in internal/cbm/vendored/ts_runtime/src/stack.c) exhibits O(n²)-or-worse parse time on deeply-nested, grammar-ambiguous input — e.g. a Perl f(f(f(...))) chain, where the grammar is ambiguous for paren-optional function calls (ambiguous_function_call_expression).

This is distinct from the stack-overflow crash fixed in #461: the CBM_TS_STACK_MERGE_MAX_DEPTH cap bounds recursion depth (preventing SIGSEGV), but does not bound the total merge work, so parse time still grows superlinearly with nesting depth.

Evidence (from the #461 effort)

  • Pre-cap: a 30k-deep ambiguous Perl chain SIGSEGV'd (stack overflow) — even on an 8 MB stack at extreme depth.
  • Post-cap: no crash, but extracting the same 30k-deep input takes > 5 minutes (would time out CI). The same input at depth 2000 completes in < 1 s — confirming superlinear growth, not the cap.
  • Unambiguous grammars (Java/C++) parse the identical f(f(...)) depth quickly because they trigger no GLR ambiguity merging.

Impact

  • Pathological/adversarial deeply-nested ambiguous source can make indexing of a single file extremely slow (effectively a per-file DoS via unbounded work). Real-world code rarely nests this deeply, so normal-repo impact is low.
  • It forced the feat(perl-lsp): Perl LSP-tier semantic resolution (Closes #459) #461 regression test (lsp_perl_deep_expression_no_crash) to cap depth at 2000, which exercises the cap's code path but no longer reproduces a true crash-without-cap on an 8 MB stack.

Suggested direction

Bound the total merge work, not just recursion depth — e.g. an overall ambiguity-merge budget / iteration cap with graceful degradation consistent with the existing MAX_LINK_COUNT bail-out, or memoize/short-circuit redundant merges. Goal: keep parse time near-linear on pathological ambiguous input while still producing a valid parse tree.

Refs

Surfaced during the Perl LSP work in #461 (the ts-runtime GLR merge-depth cap commit and the lsp_perl_deep_expression_no_crash test).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions