Skip to content

Coalesce concurrent native finder refreshes with the same key#1598

Open
StellaHuang95 wants to merge 1 commit into
microsoft:mainfrom
StellaHuang95:refreshDedup
Open

Coalesce concurrent native finder refreshes with the same key#1598
StellaHuang95 wants to merge 1 commit into
microsoft:mainfrom
StellaHuang95:refreshDedup

Conversation

@StellaHuang95

Copy link
Copy Markdown
Contributor

Summary

Fixes #1587.

At startup, several environment managers (venv, system, pyenv, pipenv, poetry, conda) each call nativeFinder.refresh(true, undefined) concurrently — typically triggered when the Python extension calls api.refreshEnvironments(undefined) via its triggerRefresh on activation. Every one of those calls resolves to the same cache key ('all') but is queued as a distinct task on the native finder's worker pool, which runs with concurrency = 1. Neither the worker pool nor doRefresh deduplicates identical submissions, so N parallel callers become N sequential full PET discoveries — same input, same output, N-1 wasted scans.

The reporter (pytorch workspace) saw the result of this stacking:

[Pipenv] Environment discovery took 199.2s (found 0 environments).
[Poetry] Environment discovery took 253.8s (found 0 environments).

plus the Setup appears hung during stage: managerRegistration watchdog firing, and the same Discovered env: <path> lines repeating 4-5× in the log — the smoking gun that PET ran the same scan multiple times back-to-back.

Fix

Add an in-flight tracking map (Map<string, Promise<NativeInfo[]>>) in NativePythonFinderImpl and a guard at the top of handleHardRefresh: if a refresh for the same key is already running, return the existing promise instead of queueing another task. The slot is cleared in .finally so a rejected refresh does not poison future callers — sequential refreshes still each run a fresh PET scan.

handleSoftRefresh already falls through to handleHardRefresh on cache miss, so it picks up the dedup automatically — a concurrent soft + hard for the same key now share one PET scan.

Why this layer

  • All duplicate-refresh paths converge at handleHardRefresh (manager refreshes, conda's getConda fallback via native.refresh(false), the Python extension's triggerRefresh, the Refresh All Environment Managers command, external API consumers). Fixing here covers every caller in one place.
  • No public API change, no observable semantic change for sequential refreshes, no test breakage.
  • Mirrors the existing inFlight: Map<string, Promise<…>> idiom in src/features/inlineScriptLazyDetector.ts.

Expected impact for #1587

Metric Before After
PET scans triggered at startup ~6-7 1
Pipenv discovery warning 106-199 s ~25-35 s
Poetry discovery warning 121-253 s ~25-35 s
Setup appears hung during stage: managerRegistration fires should not fire
Total startup-to-ready ~150 s ~40-50 s (~3× wall-clock)

The fix eliminates duplicate-scan stacking; it does not speed up a single PET scan (intrinsic ~25-35 s on pytorch on the reporter''s machine — separate from this change).

Verification

  • ✅ Webpack production build succeeds.
  • ✅ ESLint clean on the changed file.
  • ✅ Unit tests: 1215 pass (same 3 unrelated pre-existing failures on main).
  • ✅ Built a VSIX with this branch and a VSIX from main, diffed the minified handleHardRefresh body — the fix VSIX has the new structure (inFlightRefreshes.get → guard → addToQueue.then.finally), main has the old (await pool.addToQueue).

Tested manually by inspecting the minified output of both builds; would appreciate eyes from someone with the pytorch repro to confirm wall-clock improvement on a real run.

What this does NOT address (intentionally)

  • Single PET scan latency on large workspaces (separate, PET-side).
  • Cross-session cache misses on remote (issue Cross-session cache misses on every fresh remote (SSH / WSL / dev container / codespace) #1581, separate fix in flight).
  • The Python extension''s triggerRefresh on every activation (worth a follow-up — could skip when our cache is non-empty).
  • The double-init in InternalEnvironmentManager.refresh''s tail-call to getEnvironments(''all'') (separate small follow-up; produces the "phantom" lowercase-only refresh log lines).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

@StellaHuang95

Copy link
Copy Markdown
Contributor Author

Rich verified he's no longer seeing something like this in the logs: 2026-06-18 15:53:31.276 [warning] [Pipenv] Environment discovery took 106.2s (found 0 environments). If this is causing problems, please report it: https://github.com/microsoft/vscode-python-environments/issues/new

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Environment discovery is taking a VERY long time

1 participant