feat(extraction): add ArkTS language support#656
Conversation
|
Resolves #648 |
|
ARKUI Framework-aware Routes optimize is also needed I think, tree-sitter is the first step for code search in arkts repos, since closure is not easy for agents working with arkui |
|
I agree – tree-sitter is just the first step for ArkTS repos. In ArkUI, most of the “logic” lives in @State/@prop fields and @builder closures, and UI event handlers (onClick, onChange, etc.) form chains that are really hard for agents to follow if we only have plain call edges. And I plan to add ArkUI Framework-aware Routes and state/event chains, roughly: |
Add full ArkUI (HarmonyOS declarative UI) support across the detection,
routing, resolution, and synthesis layers.
## What's new
### Framework resolver (`src/resolution/frameworks/arkui.ts`)
- `detect()`: identify projects via `build-profile.json5` or `@Entry` in .ets files
- `extract()`: 3-pass scanning for `@Entry` pages, router.pushUrl/replaceUrl
navigation refs, and `@Component` structs — with decorator param support
(e.g. `@Entry({ routeName: 'main' })`)
- `resolve()`: 3-tier match strategy (filePath → qualifiedName → name suffix)
with path-prefix anchoring to prevent false matches
- `postExtract()`: ingest `main_pages.json` for HarmonyOS 5.0+ route declarations
### New NodeKind: `arkui_page` (1 addition — no other kinds changed)
- Added to `NODE_KINDS`, `HIGH_VALUE_NODE_KINDS`, and `kindBonus` (weight 9)
- Pages are treated as aggregation units; components, functions, and properties
reuse existing `component`, `function`, and `property` kinds
### Synthesis edges (3 phases, 0 new EdgeKinds)
All use `kind: 'calls'` + `provenance: 'heuristic'` + `metadata.synthesizedBy`:
| Phase | synthesizedBy | Source → Target | Purpose |
|-------|---------------|-----------------|---------|
| A | `arkui-state-chain` | sibling method → `build()` | State change → re-render bridge |
| B | `arkui-state-dep` | method → `@State` property | Marks which methods read reactive state
|
| C | `arkui-event-chain` | `build()` → handler method | `.onClick(this.handler)` → handler
wiring |
All edges are transparent to standard traversal tools (callers, callees, explore).
### Annotation output
- `codegraph_node` trail: `[dynamic: Arkui state chain via handleOK @file:42]`
- `codegraph_explore` flow spine and dynamic-dispatch links: labeled as
`Arkui Click → handleClick`, `Arkui reads @State count`, etc.
## Key fixes included
- Query both `class` and `struct` kinds in synthesis phases (ArkUI structs are
tree-sitter kind `struct`, not `class`)
- Phase C pre-filter covers all 11 event types (including drag events)
- Decorator regex supports params: `@Entry({ routeName: 'main' })`
- Removed dead qualifiedName dedup path in `postExtract`
## Tests
- `__tests__/arkui-framework.test.ts`: 29 cases covering extract, postExtract,
resolve, and all 3 synthesis phases
- Full regression: 1159 pass / 0 fail / 60 test files
feat(extraction): add ArkTS language support
Summary
Adds tree-sitter-based extraction support for ArkTS — the TypeScript superset used in Huawei's HarmonyOS / ArkUI application development. ArkTS files use the
.etsextension and introduce thestructkeyword for component definitions (@Component struct X { ... }), along with ArkUI-specific decorators (@State,@Prop,@Link,@Builder, etc.).Changes
src/types.ts'arkts'inLANGUAGESsrc/extraction/grammars.ts.ets→'arkts'extension mapping, WASM grammar registration, local-wasm loading path, and display namesrc/extraction/languages/arkts.tsLanguageExtractor(extends TypeScript extractor, addsstructTypes: ['struct_declaration'])src/extraction/languages/index.tsarktsExtractorsrc/extraction/wasm/tree-sitter-arkts.wasmtree-sitter-arkts-open)No changes were needed to the core extractor (
tree-sitter.ts,parse-worker.ts,tree-sitter-types.ts). The existingLanguageExtractorinterface andstructTypesdispatch invisitNodehandle ArkTS out of the box. ArkTS-specific decorators (@Component,@State, etc.) are already captured by the sharedextractDecoratorsForpath.Architecture
ArkTS is treated as a distinct language (not a TypeScript variant) because its grammar produces unique AST node types (
struct_declaration). The extractor extends the TypeScript extractor via object spread and overrides onlystructTypes:The dispatch chain in
TreeSitterExtractor.visitNodeflows:Verification
Tested against a real HarmonyOS application (233 files, 228
.ets):All
.etsfiles parse without errors.structnodes (e.g.SpecialDetailScreen,MainPage) are correctly classified with theirbuild()and lifecycle methods, decorators, and containment edges.