Skip to content

Support tree-sitter-language-pack >= 1.8 (new binding API) #268

@mlieberman85

Description

@mlieberman85

Summary

tree-sitter-language-pack 1.8+ changed two APIs the threat-model pipeline depends on:

  1. parser.parse() argument type: 1.5/1.6/1.7 accepted bytes; 1.8 requires str and exposes a separate parse_bytes(bytes) method for the old contract.
  2. tree.root_node: 1.5–1.7 was a property returning a Node; 1.8 is a callable method (tree.root_node()).

darnit's threat_model/ code is built against the 1.5.x shape (parser.parse(bytes) + property-style root_node). 23 call sites in production code access tree.root_node as a property — every one of them would need a wrapper or shim to support both APIs.

Reproducer

uv venv .v && source .v/bin/activate
uv pip install tree-sitter-language-pack==1.8.1 tree-sitter==0.25
python -c "
from tree_sitter_language_pack import get_parser
p = get_parser('go')
p.parse(b'package main\nfunc main(){}\n')  # TypeError: source ... 'bytes' not str
"

Real impact: parsing.py:121 raises immediately on the first file the pipeline tries to parse, so discover_all() bails before writing any output. The OSPS-SA-03.02 remediation handler surfaces this as "Executed 1 remediation handler(s)" without a useful traceback.

Stop-gap (current)

packages/darnit-baseline/pyproject.toml pins tree-sitter-language-pack>=1.5,<1.8. Anyone who picks up darnit via pip install/uv tool install will resolve to a working version. This will land via PR for fix/parse-source-language-pack-compat.

parse_source() also got a defensive fallback (try bytes → fall back to decoded str on TypeError) so the parse step survives the API split, but the downstream tree.root_node.* accesses still don't.

Proper fix (this issue)

To support 1.8+, every tree.root_node access in threat_model/ needs to handle both forms. Sketch:

def get_root_node(tree):
    """Compat: tree-sitter-language-pack 1.8+ made root_node a method."""
    rn = tree.root_node
    return rn() if callable(rn) else rn

Touchpoints (23 of them, all under packages/darnit-baseline/src/darnit_baseline/threat_model/):

  • parsing.py (the parse_source caller's check + queries)
  • ts_discovery.py (~20 call sites across all extractors)
  • grouping.py, ranking.py, ts_generators.py (a handful each)

Plus the upgrade unlocks parse_bytes() as the canonical bytes entry point — we should switch the parser call to parse_bytes when available and stop straddling the API.

Acceptance

  • pyproject.toml removes the <1.8 cap.
  • darnit installs cleanly with tree-sitter-language-pack==1.8.1 (the latest at the time of this issue) and runs generate_threat_model successfully against the same fixtures + gittuf reference.
  • Existing tests pass against both 1.5.x and 1.8.x bindings; ideally via a tox-style or matrix test.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions