refactor: use AST parsing for complex export name extraction#339
refactor: use AST parsing for complex export name extraction#339cernymatej wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReplaces per-export string parsing with batched AST-based export-name extraction using oxc-parser, adds Changes
Sequence Diagram(s)sequenceDiagram
participant Analyzer as Analyzer (src/analyze.ts)
participant Parser as oxc-parser (parseSync)
participant Mapper as NameFiller
participant TestRunner as Test Runner
Analyzer->>Parser: send batched code slices via parseSync
Parser-->>Analyzer: return ASTs / export identifiers
Analyzer->>Mapper: map AST results to export entries
Mapper-->>Analyzer: filled export names/specifiers
Analyzer->>TestRunner: produce export lists for assertions
TestRunner-->>Analyzer: validate results (pass/fail)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #339 +/- ##
==========================================
- Coverage 88.04% 82.71% -5.33%
==========================================
Files 8 7 -1
Lines 1062 457 -605
Branches 188 135 -53
==========================================
- Hits 935 378 -557
+ Misses 127 72 -55
- Partials 0 7 +7 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/analyze.ts (1)
624-629: Non-null assertion onexportName.namecould throw at runtime.The comment at line 627 states bare star exports won't be parsed, but if an unexpected export type slips through (e.g., due to code evolution or edge cases), the non-null assertion
e.exportName.name!will throw an unclear runtime error.Consider adding a defensive filter or a more explicit error.
Defensive approach
return module.staticExports.map((e) => e.entries .filter((e) => keepTypes || !e.isType) - // safe to non-null assert because we don't parse bare star exports without specifiers `export * from 'module'` - .map((e) => e.exportName.name!), + .map((e) => e.exportName.name) + .filter((name): name is string => name != null), );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/analyze.ts` around lines 624 - 629, The code uses a non-null assertion on exportName.name inside module.staticExports.map which can throw if an unexpected export shape appears; update the pipeline that iterates e.entries to defensively handle missing exportName or name by either filtering entries where !e.exportName || !e.exportName.name (preserving the keepTypes logic) or by mapping to a value using e.exportName?.name and then filtering out falsy values, or alternatively, throw a clear, descriptive error that includes module/static export context when exportName.name is absent; apply this change to the entries iteration so exportName.name is never blindly asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/analyze.ts`:
- Around line 624-629: The code uses a non-null assertion on exportName.name
inside module.staticExports.map which can throw if an unexpected export shape
appears; update the pipeline that iterates e.entries to defensively handle
missing exportName or name by either filtering entries where !e.exportName ||
!e.exportName.name (preserving the keepTypes logic) or by mapping to a value
using e.exportName?.name and then filtering out falsy values, or alternatively,
throw a clear, descriptive error that includes module/static export context when
exportName.name is absent; apply this change to the entries iteration so
exportName.name is never blindly asserted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b799140-d965-4431-8c67-ba9c21ac583c
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (4)
build.config.tspackage.jsonsrc/analyze.tstest/exports.test.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/analyze.ts`:
- Around line 604-609: The _parseNamedExportsRe helper currently can’t handle
mixed inline `type` modifiers (e.g. `export { type Foo, bar }`) because
keepTypes is a boolean; change _parseNamedExportsRe to return per-entry metadata
(e.g. objects with { name, isType }) instead of only names so callers can
distinguish `type`-prefixed entries, update any callers such as findExports and
findTypeExports to consume the metadata (or add a typesOnly flag to
_parseNamedExportsRe if you prefer minimal change) and use the isType flag to
route names to the correct API; ensure you still strip `as` aliases (the
existing .replace(/^.*?\sas\s/, "") logic) and keep referencing TYPE_RE to
detect type tokens.
- Around line 422-456: The code is passing raw regex-derived slices to
_batchParseExportNames(parseSync) which can be truncated or include matches
inside strings/comments and causes the parser to throw; modify the block that
builds needsParse and calls _batchParseExportNames so that the call is wrapped
in a try-catch around _batchParseExportNames(needsParse.map(e => e.code)), and
on error fall back to a safe strategy (e.g., use _parseNamedExportsRe or other
regex-based parsing for those specific entries, or compute slices from actual
statement bounds instead of EXPORT_* regexes); ensure
declaredExports/namedExports/destructuredExports entries keep a valid .names
value (use existing helpers like _parseNamedExportsRe) so downstream
_filterStatement and parseSync counts remain consistent.
- Line 621: The call to parseSync currently uses "input.ts" which disables JSX
parsing and breaks cases handled by _batchParseExportNames for comma-separated
JSX/TSX exports; update the parseSync invocation that parses the combined source
(variable combined) to either use "input.tsx" as the filename or pass the option
{ lang: "tsx" } so the oxc-parser accepts JSX/TSX syntax when parsing in
parseSync("input.ts", combined, ...), ensuring JSX exports are parsed correctly.
| // Collect all code slices that need AST parsing and batch them into a single parseSync call | ||
| const needsParse: ESMExport[] = []; | ||
|
|
||
| for (const declaredExport of declaredExports) { | ||
| const extraNamesStr = (declaredExport as any).extraNames as | ||
| | string | ||
| | undefined; | ||
| delete (declaredExport as any).extraNames; | ||
| if (!EXPORT_FUNCTION_RE.test(declaredExport.code) && extraNamesStr) { | ||
| // Extra names can contain commas inside values: `export const foo = [1, 2], bar = 3`, | ||
| // `export const foo = fn(a, b), bar = 3`, generics in type annotations, etc. | ||
| needsParse.push(declaredExport); | ||
| } | ||
| } | ||
|
|
||
| for (const namedExport of namedExports) { | ||
| // String literals in export names can contain commas: `export { foo as "bar, baz" }` (ES2022) | ||
| if (EXPORT_STRING_LITERAL_RE.test(namedExport.exports)) { | ||
| needsParse.push(namedExport); | ||
| } else { | ||
| namedExport.names = _parseNamedExportsRe(namedExport.exports); | ||
| } | ||
| } | ||
|
|
||
| // Default values in destructuring can contain commas: `export const { foo = fn(a, b) } = obj` | ||
| for (const destructuredExport of destructuredExports) { | ||
| needsParse.push(destructuredExport); | ||
| } | ||
|
|
||
| if (needsParse.length > 0) { | ||
| // Get the export names for each export | ||
| const parsedExports = _batchParseExportNames(needsParse.map((e) => e.code)); | ||
| for (const [i, target] of needsParse.entries()) { | ||
| target.names = parsedExports[i]; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Get the file size to determine how to read it
wc -l src/analyze.tsRepository: unjs/mlly
Length of output: 71
🏁 Script executed:
# Look at the specific lines mentioned in the review (422-456)
sed -n '422,456p' src/analyze.ts | cat -nRepository: unjs/mlly
Length of output: 1717
🏁 Script executed:
# Search for EXPORT_DECAL_RE definition and related regex patterns
rg "EXPORT_DECAL_RE|EXPORT_FUNCTION_RE|EXPORT_STRING_LITERAL_RE" -n src/analyze.ts | head -30Repository: unjs/mlly
Length of output: 493
🏁 Script executed:
# Search for _filterStatement definition to understand when it runs
rg "_filterStatement" -n src/analyze.ts | head -20Repository: unjs/mlly
Length of output: 272
🏁 Script executed:
# Let's also look at the broader context to understand the flow
sed -n '400,470p' src/analyze.ts | cat -nRepository: unjs/mlly
Length of output: 2910
🏁 Script executed:
# Get the EXPORT_DECAL_RE definition
sed -n '263,284p' src/analyze.ts | cat -nRepository: unjs/mlly
Length of output: 1862
🏁 Script executed:
# Get the _filterStatement function definition
sed -n '693,737p' src/analyze.ts | cat -nRepository: unjs/mlly
Length of output: 1563
🏁 Script executed:
# Run the verification script from the review comment to test regex behavior
node <<'NODE'
const re = /\bexport\s+(?<declaration>(?:async function\s*\*?|function\s*\*?|let|const enum|const|enum|var|class))\s+\*?(?<name>[\w$]+)(?<extraNames>.*,\s*[\s\w:[\]{}]*[\w$\]}]+)*/g;
for (const code of [
'export const foo = fn(a, b)',
'export const foo = [1, 2]',
'const s = "export const foo = fn(a, b)"',
]) {
re.lastIndex = 0;
const match = re.exec(code);
console.log(JSON.stringify({
code,
match: match?.[0],
extraNames: match?.groups?.extraNames,
}));
}
NODERepository: unjs/mlly
Length of output: 368
🏁 Script executed:
# Look at matchAll function to understand how code is extracted
rg "function matchAll|const matchAll" -A 30 src/analyze.tsRepository: unjs/mlly
Length of output: 35
🏁 Script executed:
# Search for matchAll function to see how code slices are created
rg "function matchAll|export.*matchAll" -B 5 -A 25 src/Repository: unjs/mlly
Length of output: 1335
🏁 Script executed:
# Look at what happens to declaredExports - check if _filterStatement is applied
sed -n '400,500p' src/analyze.ts | cat -nRepository: unjs/mlly
Length of output: 3969
🏁 Script executed:
# Check what _batchParseExportNames does
rg "_batchParseExportNames" -B 5 -A 20 src/analyze.tsRepository: unjs/mlly
Length of output: 2300
🏁 Script executed:
# Test if truncated code actually causes a parse error
node <<'NODE'
// Simulate what _batchParseExportNames does
const code = "export const foo = fn(a, b"; // Truncated like the regex produces
try {
// Try to parse this truncated code
const result = new Function('parseSync', `
return parseSync("input.ts", "${code}", { sourceType: "module" });
`);
console.log("Parse succeeded unexpectedly");
} catch (e) {
console.log("Parse failed as expected:", e.message.substring(0, 100));
}
// Also test with actual oxc-parser if available
try {
const { parseSync } = require('@oxc-parser/wasm');
const combined = "export const foo = fn(a, b";
const result = parseSync("input.ts", combined, { sourceType: "module" });
console.log("oxc-parser result:", result);
} catch (e) {
console.log("oxc-parser error (expected):", e.message.substring(0, 80));
}
NODERepository: unjs/mlly
Length of output: 191
🏁 Script executed:
# Verify the complete order: does needsParse get populated and passed to parser before _filterStatement?
sed -n '400,500p' src/analyze.ts | grep -n "needsParse\|_batchParseExportNames\|_filterStatement"Repository: unjs/mlly
Length of output: 443
🏁 Script executed:
# Check if _batchParseExportNames is wrapped in try-catch
sed -n '450,470p' src/analyze.ts | cat -nRepository: unjs/mlly
Length of output: 768
🏁 Script executed:
# Look at the complete findExports to see error handling
sed -n '420,500p' src/analyze.ts | grep -B 5 -A 5 "try\|catch"Repository: unjs/mlly
Length of output: 303
Raw regex matches may cause parser failures on valid code.
The EXPORT_DECAL_RE regex produces incomplete code slices for valid declarations. For example, export const foo = fn(a, b) matches only as export const foo = fn(a, b (truncated before the closing paren), and export const foo = [1, 2] matches as export const foo = [1, 2] (complete only by coincidence). These raw code slices from lines 426–450 are passed directly to _batchParseExportNames() at line 454, which joins them and calls parseSync(). This happens before _filterStatement() validation at line 498, and without any error handling. When the parser's staticExports count doesn't match the slices count, the function throws. Additionally, the regex can match inside string literals and comments, passing invalid slices to the parser.
Wrap the _batchParseExportNames() call in try-catch to gracefully fall back to regex-based parsing on invalid slices, or derive parse input from actual statement bounds rather than regex matches.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/analyze.ts` around lines 422 - 456, The code is passing raw regex-derived
slices to _batchParseExportNames(parseSync) which can be truncated or include
matches inside strings/comments and causes the parser to throw; modify the block
that builds needsParse and calls _batchParseExportNames so that the call is
wrapped in a try-catch around _batchParseExportNames(needsParse.map(e =>
e.code)), and on error fall back to a safe strategy (e.g., use
_parseNamedExportsRe or other regex-based parsing for those specific entries, or
compute slices from actual statement bounds instead of EXPORT_* regexes); ensure
declaredExports/namedExports/destructuredExports entries keep a valid .names
value (use existing helpers like _parseNamedExportsRe) so downstream
_filterStatement and parseSync counts remain consistent.
There was a problem hiding this comment.
oxc-parser handles the error gracefully... can we just rely on it?
Errors: [
{
"severity": "Error",
"message": "Expected `)` but found `EOF`",
"labels": [
{
"message": "`)` expected",
"start": 26,
"end": 26
}
],
"helpMessage": null,
"codeframe": "\n x Expected `)` but found `EOF`\n ,-[input.ts:1:27]\n 1 | export const foo = fn(a, b\n `----\n"
}
]
Static exports: [
{
"start": 0,
"end": 26,
"entries": [
{
"start": 7,
"end": 26,
"moduleRequest": null,
"importName": {
"kind": "None",
"name": null,
"start": null,
"end": null
},
"exportName": {
"kind": "Name",
"name": "foo",
"start": 13,
"end": 16
},
"localName": {
"kind": "Name",
"name": "foo",
"start": 13,
"end": 16
},
"isType": false
}
]
}
]There was a problem hiding this comment.
@cernymatej, that's a great finding — the evidence is compelling that parseSync doesn't throw for recoverable parse errors and still extracts the correct export names. Oxc features "advanced error handling with meaningful messages" and error recovery, and the result of parseSync contains an errors array for any issues encountered.
That said, there is one edge case worth considering: the parser returns an empty Program on unrecoverable errors — recoverable errors are stored inside errors. The ParserReturn struct has a panicked field for exactly this scenario. If parseSync ever enters a panicked state (e.g., a completely garbled code slice), module.staticExports would be empty, and the existing count-mismatch guard would throw:
if (module.staticExports.length !== codeSlices.length) {
throw new Error(
`Expected ${codeSlices.length} exports but parsed ${module.staticExports.length}`,
);
}For the typical truncated-export cases you demonstrated, relying on oxc-parser's recovery is absolutely fine. To make the code robust against the unrecoverable edge case, a small defensive check would be sufficient — either checking for the panicked state directly or wrapping the entire call in a try-catch and falling back to regex-based parsing:
function _batchParseExportNames(
codeSlices: string[],
keepTypes = false,
): string[][] {
const combined = codeSlices.join("\n");
const { module } = parseSync("input.ts", combined, { sourceType: "module" });
if (module.staticExports.length !== codeSlices.length) {
// Fallback: return empty names per slice so downstream code stays consistent
return codeSlices.map(() => []);
}
return module.staticExports.map((e) =>
e.entries
.filter((e) => keepTypes || !e.isType)
.map((e) => e.exportName.name!),
);
}This is a minor robustness concern rather than a critical bug — the truncated slice case you tested clearly works. Up to you whether you want to harden that guard.
| function _parseNamedExportsRe(exportsStr: string, keepTypes = false): string[] { | ||
| return exportsStr | ||
| .replace(/^\r?\n?/, "") | ||
| .split(/\s*,\s*/g) | ||
| .filter((name) => name && (keepTypes || !TYPE_RE.test(name))) | ||
| .map((name) => name.replace(/^.*?\sas\s/, "").trim()); |
There was a problem hiding this comment.
Mixed inline type exports still fall through.
A boolean keepTypes switch cannot represent export { type Foo, bar }. findExports() drops Foo, and findTypeExports() only scans export type { ... }, so that valid TS form never appears in either API.
This helper likely needs a typesOnly mode, or it should return per-entry metadata instead of only names.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/analyze.ts` around lines 604 - 609, The _parseNamedExportsRe helper
currently can’t handle mixed inline `type` modifiers (e.g. `export { type Foo,
bar }`) because keepTypes is a boolean; change _parseNamedExportsRe to return
per-entry metadata (e.g. objects with { name, isType }) instead of only names so
callers can distinguish `type`-prefixed entries, update any callers such as
findExports and findTypeExports to consume the metadata (or add a typesOnly flag
to _parseNamedExportsRe if you prefer minimal change) and use the isType flag to
route names to the correct API; ensure you still strip `as` aliases (the
existing .replace(/^.*?\sas\s/, "") logic) and keep referencing TYPE_RE to
detect type tokens.
| // TODO: support nested destructuring | ||
| it.fails("export const { foo = { a: 1, b: 2 }, bar } = obj", () => { | ||
| const matches = findExports( | ||
| "export const { foo = { a: 1, b: 2 }, bar } = obj", | ||
| ); | ||
| expect(matches).toHaveLength(1); | ||
| expect(matches[0].names).toEqual(["foo", "bar"]); | ||
| }); | ||
| it.fails("export const { foo, bar: { baz, qux } } = obj", () => { | ||
| const matches = findExports( | ||
| "export const { foo, bar: { baz, qux } } = obj", | ||
| ); | ||
| expect(matches).toHaveLength(1); | ||
| expect(matches[0].names).toEqual(["foo", "baz", "qux"]); | ||
| }); |
There was a problem hiding this comment.
btw. this didn't work before either
I'm not sure if we can support this case with only simple regex targeting to get the code slices for the parser. And parsing the whole file turned out to slow things down significantly.
fix unjs/unimport#502
fix unjs/unimport#500
fix nuxt/nuxt#34432
fix nuxt/nuxt#34407
fix unjs/unimport#273 (possibly?)
This replaces @danielroe's solution for handling commas in export statemens from #336 with a proper AST-based parser, which should handle the remaining edge cases.
Probably the main difference compared the current implementation in
mainis that the function can now fail if we try to parse non-valid JavaScript syntax.Other than that, it is mostly identical in terms of performance in the majority of cases. We still use a regex to find the exports and only parse the ones that actually need it, reducing the performance impact significantly.
In the more complex cases where parsing does need to happen, all code slices are batched into a single
parseSynccall, which has also proven to speed things up a lot. The calls where we parse the code are ~1.5x slower than the old regex-only approach, but it no longer produces incorrect results.For now, I haven't touched the part of the code that uses
acornfor getting token locations. I haven't measured the performance ofacorn, but I feel like tokenizing is much lighter than parsing the entire source file viaoxc-parser... Maybe we can revisit this in the future?Summary by CodeRabbit
New Features
Tests