Improve resolveReferences performance#4004
Conversation
Batch placeholder replacement in resolveAllReferences so each source file gets one text.replace + one replaceWithText call, instead of one cycle per placeholder. Each replaceWithText call discards ts-morph's cached AST and re-parses the file, so the previous per-placeholder loop made the cost grow as O(placeholders per file * file size). For each file we now: - scan the text once to collect the placeholders actually present; - iterate the precomputed declaration/dependency maps (preserving the original insertion order so name-collision aliasing in addImport is unchanged); - accumulate a single placeholder -> local-name map; - apply all replacements in a single bulk text.replace and a single replaceWithText. Output is byte-identical on batch_modular and openai_modular smoke specs, and the binder integration tests (22) all pass unchanged. Real-SDK measurements (Microsoft.Network and Microsoft.Compute management-plane via 'tsp compile client.tsp --emit=@azure-tools/typespec-ts'): | Spec | resolve references (before) | resolve references (after) | Total onEmit (before) | Total onEmit (after) | |---------|----------------------------:|---------------------------:|----------------------:|---------------------:| | Network | 22:08.539 | 8.664s | 31:25.022 | 8:44.110 | | Compute | 3:20.601 | 3.455s | 5:17.883 | 1:49.726 | That's a 153x speedup on the binder phase for Network (22 minutes saved per SDK generation) and 58x for Compute (3.5 minutes saved). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
There is an existing test file for binder https://github.com/Azure/autorest.typescript/blob/main/packages/typespec-ts/test-next/integration/hooks/binder.test.ts and all cases pass. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the TypeSpec TS emitter’s binder reference resolution by batching placeholder replacements per source file, avoiding repeated ts-morph replaceWithText calls that force costly re-parses and scale poorly with placeholder count.
Changes:
- Precomputes placeholder→declaration/dependency lookup maps once per
resolveAllReferencescall. - Scans each source file once for present placeholders, builds a per-file replacement map, and applies all substitutions in a single bulk pass.
- Replaces per-placeholder replacement logic with
collectPlaceholders+applyReplacementsutilities.
| const text = sourceFile.getFullText(); | ||
| const placeholderRegex = new RegExp( | ||
| `${escapeRegExp(PLACEHOLDER_PREFIX)}.+?__`, | ||
| "g" | ||
| ); |
| // Pre-compute placeholder -> declaration/dependency maps | ||
| const declarationByPlaceholder = new Map< | ||
| string, | ||
| [unknown, DeclarationInfo | StaticHelperMetadata] |
There was a problem hiding this comment.
why the first element of the tuple is unknown? is it the return type of this.serializePlaceholder?
There was a problem hiding this comment.
oh wait it should be the input of this.serializePlaceholder.
There was a problem hiding this comment.
Yes, the unknown here is consistent with the key type of this.declarations: private declarations = new Map<unknown, DeclarationInfo>();.
maorleger
left a comment
There was a problem hiding this comment.
I'm far from an expert here but the perf numbers are definitely exciting! I linked to the ts-morph performance documentation in a comment and I wonder if there are other places where we can batch operations for quick wins, not blocking of course
There was a problem hiding this comment.
not blocking for this PR but I wonder if we can get some more wins by batching operations. This pattern is explicitly called out in https://ts-morph.com/manipulation/performance#performance-tip-batch-operations and may be an easy replacement to make
Have you seen the performance ts-morph docs ? Could be useful to point an agent to them and find low-hanging fruit to increase perf as well
There was a problem hiding this comment.
Great call, thanks for the pointer, Maor! I replaced the loop with a single addImportDeclarations bulk call in the latest commit.
Replace the per-import addImportDeclaration loop with a single addImportDeclarations bulk call. Each individual addImportDeclaration triggers a full re-parse of the source file in ts-morph, so emitting N imports per file was O(N) re-parses; bulk-add collapses this to one.
e2ec4ae to
923390f
Compare
Summary
Two complementary changes to
Binderthat reduce the cost of theresolve referencesemit phase:resolveAllReferencesso each source file gets one bulktext.replace+ onereplaceWithText, instead of one cycle per placeholder. EachreplaceWithTextdiscards ts-morph's cached AST and re-parses the file, so the previous per-placeholder loop grew as O(placeholders per file × file size).addImportDeclarationscall per file, instead of loopingaddImportDeclarationonce per import. Each individual call triggers a full file re-parse in ts-morph.What changed
For each source file the binder now:
collectPlaceholdersto collect placeholders actually present in the file.declarationByPlaceholder/dependencyByPlaceholdermaps (preserving the original insertion order, so name-collision aliasing inaddImportis unchanged).applyReplacementsin one bulktext.replace+ onereplaceWithText.addImportDeclarationscall (sort order unchanged).Only
packages/typespec-ts/src/framework/hooks/binder.tsis touched.Performance results
Measured by running
npx tsp compile client.tsp --emit=@azure-tools/typespec-tsagainst the real Azure spec files (specification/network/.../Network/client.tspandspecification/compute/.../Compute/client.tsp), with per-phaseconsole.timeinstrumentation around$onEmit. Single run per side.resolve references(before)resolve references(after)onEmit(before)onEmit(after)Incremental gain from the follow-up
addImportDeclarationscommit, on top of the placeholder-batching commit:resolve referencesbeforeSee the baseline timing breakdown in #3907 —
resolve referencesaccounted for 70.5% of total emit time on Network and 63.1% on Compute before this change.