Skip to content

Improve resolveReferences performance#4004

Merged
JialinHuang803 merged 3 commits into
Azure:mainfrom
JialinHuang803:perf-binder-batched-replacement
May 28, 2026
Merged

Improve resolveReferences performance#4004
JialinHuang803 merged 3 commits into
Azure:mainfrom
JialinHuang803:perf-binder-batched-replacement

Conversation

@JialinHuang803
Copy link
Copy Markdown
Member

@JialinHuang803 JialinHuang803 commented May 26, 2026

Summary

Two complementary changes to Binder that reduce the cost of the resolve references emit phase:

  1. Batch placeholder replacement in resolveAllReferences so each source file gets one bulk text.replace + one replaceWithText, instead of one cycle per placeholder. Each replaceWithText discards ts-morph's cached AST and re-parses the file, so the previous per-placeholder loop grew as O(placeholders per file × file size).
  2. Bulk-add import declarations with a single addImportDeclarations call per file, instead of looping addImportDeclaration once per import. Each individual call triggers a full file re-parse in ts-morph.

What changed

For each source file the binder now:

  1. Scans the text once via collectPlaceholders to collect placeholders actually present in the file.
  2. Iterates the precomputed declarationByPlaceholder / dependencyByPlaceholder maps (preserving the original insertion order, so name-collision aliasing in addImport is unchanged).
  3. Accumulates a single placeholder → local-name map.
  4. Applies all replacements via applyReplacements in one bulk text.replace + one replaceWithText.
  5. Emits all collected imports for the file in a single addImportDeclarations call (sort order unchanged).

Only packages/typespec-ts/src/framework/hooks/binder.ts is touched.

Performance results

Measured by running npx tsp compile client.tsp --emit=@azure-tools/typespec-ts against the real Azure spec files (specification/network/.../Network/client.tsp and specification/compute/.../Compute/client.tsp), with per-phase console.time instrumentation around $onEmit. Single run per side.

Spec resolve references (before) resolve references (after) Total onEmit (before) Total onEmit (after)
Network 22:08.539 4.397s 31:25.022 7:27.367
Compute 3:20.601 2.082s 5:17.883 1:47.209
  • Network: ~302× speedup on the binder phase, ~24 min saved per SDK generation (31:25 → 7:27 end-to-end).
  • Compute: ~96× speedup on the binder phase, ~3.5 min saved per SDK generation (5:18 → 1:47 end-to-end).

Incremental gain from the follow-up addImportDeclarations commit, on top of the placeholder-batching commit:

Spec resolve references before after Δ
Network 8.664s 4.397s −49% (−4.27s)
Compute 2.942s 2.082s −29% (−0.86s)

See the baseline timing breakdown in #3907resolve references accounted for 70.5% of total emit time on Network and 63.1% on Compute before this change.

Batch placeholder replacement in resolveAllReferences so each source
file gets one text.replace + one replaceWithText call, instead of one
cycle per placeholder. Each replaceWithText call discards ts-morph's
cached AST and re-parses the file, so the previous per-placeholder loop
made the cost grow as O(placeholders per file * file size).

For each file we now:
- scan the text once to collect the placeholders actually present;
- iterate the precomputed declaration/dependency maps (preserving the
  original insertion order so name-collision aliasing in addImport is
  unchanged);
- accumulate a single placeholder -> local-name map;
- apply all replacements in a single bulk text.replace and a single
  replaceWithText.

Output is byte-identical on batch_modular and openai_modular smoke
specs, and the binder integration tests (22) all pass unchanged.

Real-SDK measurements (Microsoft.Network and Microsoft.Compute
management-plane via 'tsp compile client.tsp --emit=@azure-tools/typespec-ts'):

| Spec    | resolve references (before) | resolve references (after) | Total onEmit (before) | Total onEmit (after) |
|---------|----------------------------:|---------------------------:|----------------------:|---------------------:|
| Network |                  22:08.539  |                     8.664s |            31:25.022  |            8:44.110  |
| Compute |                   3:20.601  |                     3.455s |             5:17.883  |            1:49.726  |

That's a 153x speedup on the binder phase for Network (22 minutes
saved per SDK generation) and 58x for Compute (3.5 minutes saved).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JialinHuang803 JialinHuang803 linked an issue May 26, 2026 that may be closed by this pull request
@qiaozha qiaozha self-assigned this May 27, 2026
@qiaozha qiaozha added HRLC p0 priority 0 labels May 27, 2026
@JialinHuang803 JialinHuang803 changed the title Improve JS emitter performance Improve resolveReferences performance for JS emitter May 27, 2026
@JialinHuang803
Copy link
Copy Markdown
Member Author

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the TypeSpec TS emitter’s binder reference resolution by batching placeholder replacements per source file, avoiding repeated ts-morph replaceWithText calls that force costly re-parses and scale poorly with placeholder count.

Changes:

  • Precomputes placeholder→declaration/dependency lookup maps once per resolveAllReferences call.
  • Scans each source file once for present placeholders, builds a per-file replacement map, and applies all substitutions in a single bulk pass.
  • Replaces per-placeholder replacement logic with collectPlaceholders + applyReplacements utilities.

Comment on lines +490 to +494
const text = sourceFile.getFullText();
const placeholderRegex = new RegExp(
`${escapeRegExp(PLACEHOLDER_PREFIX)}.+?__`,
"g"
);
// Pre-compute placeholder -> declaration/dependency maps
const declarationByPlaceholder = new Map<
string,
[unknown, DeclarationInfo | StaticHelperMetadata]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the first element of the tuple is unknown? is it the return type of this.serializePlaceholder?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait it should be the input of this.serializePlaceholder.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the unknown here is consistent with the key type of this.declarations: private declarations = new Map<unknown, DeclarationInfo>();.

Copy link
Copy Markdown
Member

@maorleger maorleger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm far from an expert here but the perf numbers are definitely exciting! I linked to the ts-morph performance documentation in a comment and I wonder if there are other places where we can batch operations for quick wins, not blocking of course

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking for this PR but I wonder if we can get some more wins by batching operations. This pattern is explicitly called out in https://ts-morph.com/manipulation/performance#performance-tip-batch-operations and may be an easy replacement to make

Have you seen the performance ts-morph docs ? Could be useful to point an agent to them and find low-hanging fruit to increase perf as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call, thanks for the pointer, Maor! I replaced the loop with a single addImportDeclarations bulk call in the latest commit.

Replace the per-import addImportDeclaration loop with a single addImportDeclarations bulk call. Each individual addImportDeclaration triggers a full re-parse of the source file in ts-morph, so emitting N imports per file was O(N) re-parses; bulk-add collapses this to one.
@JialinHuang803 JialinHuang803 force-pushed the perf-binder-batched-replacement branch from e2ec4ae to 923390f Compare May 28, 2026 06:49
@JialinHuang803 JialinHuang803 changed the title Improve resolveReferences performance for JS emitter Improve resolveReferences performance May 28, 2026
@JialinHuang803 JialinHuang803 merged commit 119dc3a into Azure:main May 28, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

HRLC p0 priority 0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants