feat(extraction): add Terraform and OpenTofu language support#706
Open
Javviviii2 wants to merge 1 commit into
Open
feat(extraction): add Terraform and OpenTofu language support#706Javviviii2 wants to merge 1 commit into
Javviviii2 wants to merge 1 commit into
Conversation
Index .tf, .tfvars, and .tofu files via the tree-sitter-terraform dialect of HCL (vendored from @tree-sitter-grammars/tree-sitter-hcl, Apache-2.0). Symbols extracted: - resource / data → class (qualified "type.name" / "data.type.name") - module → module (qualified "module.name") - variable → variable (qualified "var.name") - output → variable (qualified "output.name") - provider → namespace - locals → constant per attribute (qualified "local.key") References resolved cross-file: - var.X, local.X, module.M[.out], data.T.N[.attr], <type>.<name>[.attr] - built-ins skipped: each.*, count.*, self.*, path.*, terraform.workspace The Terraform framework resolver disambiguates same-named candidates across modules by preferring the one in the same directory as the reference site, then by closest common-ancestor path, falling back to the generic name matcher only when neither applies. Validated on two Terraform monorepos (277 and 470 .tf files): indexing runs in 1.3s and 2.4s respectively, query latency stays under 200ms, and cross-module references resolve to the correct module 100% of the time on inspected samples. 18 new extraction tests; full suite 1146/1148 green (2 pre-existing flaky skips, 0 regressions).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this changes
Adds Terraform / OpenTofu as a first-class indexed language.
.tf,.tfvars, and.tofufiles are now parsed into the graph, socodegraph_search,codegraph_callers,codegraph_callees, andcodegraph_impactreturn real results on infrastructure repos instead of nothing.Today, opening any Terraform monorepo with CodeGraph indexes
0 nodes / 0 edgesbecause there is no language extractor — files are skipped after detection. After this PR, the same repo gets a full symbol graph.How
Three small pieces:
Grammar — vendors
tree-sitter-terraform.wasm(92 KB) built from@tree-sitter-grammars/tree-sitter-hcl@1.2.0(Apache-2.0). I picked theterraformdialect rather than generichclbecause it targets.tf/.tfvars/.tofuexactly. The wasm is referenced through the existingpath.join(__dirname, 'wasm', ...)vendor branch (same as Pascal/Scala/Lua/Luau), so no new npm dependency.LanguageExtractor(src/extraction/languages/terraform.ts) — HCL's grammar emits every top-level construct as a genericblocknode, so the extractor drives everything throughvisitNode, inspecting the firstidentifierchild to decide the block kind:resource "T" "N"classT.Ndata "T" "N"classdata.T.Nmodule "M"modulemodule.Mvariable "X"variablevar.Xoutput "X"variableoutput.Xprovider "P"namespaceprovider.Plocals { k = … }constantper attrlocal.kReferences (
var.X,module.M.out,data.T.N.attr,<type>.<name>.attr,local.k) are emitted asunresolved_refs. Built-in heads (each,count,self,path,terraform.workspace) are filtered.FrameworkResolver(src/resolution/frameworks/terraform.ts) — when the same qualified name (e.g.var.project_id) exists in multiple modules, the resolver prefers:module.M.xrefs, the directory containing amodule "M"declaration.Validation
I validated against two real Terraform monorepos. All numbers measured locally on Node 22.22 / Linux.
.tffilesPost-index query latency on the 470-file repo:
Cross-module precision — same-named variable used to bleed across modules with the generic matcher; now scoped correctly. Spot-check on
var.project_idin repo A, called frommodules/net-vpc/main.tf:Tests
__tests__/extraction.test.ts: 18 new tests underdescribe('Terraform Extraction')covering language detection, all six block kinds, locals attribute fan-out, theterraform { }settings block (correctly ignored), reference extraction forvar/local/module/data/managed-resource heads, and the built-in skip list.Full suite: 1146 passed / 2 skipped, 0 regressions (baseline was 1128 passed / 2 skipped + 1 pre-existing flaky watcher test).
Notes for review
LANGUAGES, one inEXTRACTORS, one inFRAMEWORK_RESOLVERS, one extension mapping.## [Unreleased]→### New Features, written user-facing per the house rule (no internal paths or symbol names).grammars.ts(Apache-2.0, source repo and source release noted).### Validation methodologyformal A/B agent run fromdocs/design/dynamic-dispatch-coverage-playbook.mdsince I don't have access to the small/medium/large benchmark repo set referenced there. Happy to follow up with that as a separate PR if you'd like to add Terraform to the matrix — I can pick a public Terraform repo per size tier.Thanks for the great project — the architecture made it really clean to extend.