[Storage] Change default parallelism to CPU-based value#26880
[Storage] Change default parallelism to CPU-based value#26880tanyasethi-msft wants to merge 7 commits into
Conversation
…, azfile, azdatalake Update default concurrency for uploads and downloads from a fixed value of 5 (or 1 for stream uploads) to Clamp(NumCPU, 8, 96), matching the Rust SDK implementation. This improves throughput on modern multi-core VMs while clamping prevents resource exhaustion on small or very large machines. Adds AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=true env var opt-out to revert to the previous default of 5.
There was a problem hiding this comment.
Pull request overview
This PR updates storage transfer defaults so azblob, azfile, and azdatalake choose parallelism from CPU count instead of fixed legacy values, with an environment-variable opt-out for compatibility.
Changes:
- Adds
DefaultConcurrencyValue()in shared storage helpers and applies it to batch transfers and stream uploads. - Updates option comments to describe CPU-based concurrency defaults.
- Adds unit coverage for default concurrency selection and live benchmarks for azblob parallelism.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
sdk/storage/azfile/internal/shared/batch_transfer.go |
Adds CPU-based default concurrency helper and uses it for batch transfers. |
sdk/storage/azfile/internal/shared/batch_transfer_test.go |
Adds tests for concurrency bounds, CPU matching, determinism, and legacy env var. |
sdk/storage/azfile/file/models.go |
Updates file transfer option docs and stream upload defaulting. |
sdk/storage/azdatalake/internal/shared/batch_transfer.go |
Adds CPU-based default concurrency helper and uses it for batch transfers. |
sdk/storage/azdatalake/internal/shared/batch_transfer_test.go |
Adds tests for concurrency bounds, CPU matching, determinism, and legacy env var. |
sdk/storage/azdatalake/file/models.go |
Updates datalake transfer option docs and stream upload defaulting. |
sdk/storage/azblob/internal/shared/shared_test.go |
Adds tests for shared concurrency default behavior. |
sdk/storage/azblob/internal/shared/batch_transfer.go |
Adds CPU-based default concurrency helper and uses it for batch transfers. |
sdk/storage/azblob/blockblob/parallelism_bench_test.go |
Adds live benchmarks for upload/download parallelism behavior. |
sdk/storage/azblob/blockblob/models.go |
Updates block blob transfer option docs and stream upload defaulting. |
sdk/storage/azblob/blockblob/chunkwriting_test.go |
Adjusts buffer allocation assertion for higher default stream concurrency. |
sdk/storage/azblob/blob/models.go |
Updates blob download option docs for CPU-based defaults. |
sdk/storage/azblob/blob/constants.go |
Deprecates fixed DefaultConcurrency and exposes DefaultConcurrencyValue(). |
| func (u *UploadStreamOptions) setDefaults() { | ||
| if u.Concurrency == 0 { | ||
| u.Concurrency = 1 | ||
| u.Concurrency = int(shared.DefaultConcurrencyValue()) |
There was a problem hiding this comment.
This method returns 5 if AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY environment variable is set. Whereas the previous default value for concurrency in UploadStreamOptions is 1.
| func (u *UploadStreamOptions) setDefaults() { | ||
| if u.Concurrency == 0 { | ||
| u.Concurrency = 1 | ||
| u.Concurrency = shared.DefaultConcurrencyValue() |
There was a problem hiding this comment.
Same as previous, this returns the previous default value as 5 instead of 1
| func (u *UploadStreamOptions) setDefaults() { | ||
| if u.Concurrency == 0 { | ||
| u.Concurrency = 1 | ||
| u.Concurrency = int(shared.DefaultConcurrencyValue()) |
| // DefaultConcurrency is the legacy default number of blocks downloaded or uploaded in parallel. | ||
| // | ||
| // Deprecated: Use DefaultConcurrencyValue() instead, which returns a value based on CPU core count. | ||
| DefaultConcurrency = shared.DefaultConcurrency //nolint:staticcheck // intentional re-export of deprecated const for backward compat |
There was a problem hiding this comment.
Add changelog entries in all the three packages
| // Each concurrent upload will create a buffer of size BlockSize. The default is based on | ||
| // CPU core count (min 8, max 96). Set AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=true to revert to the previous default. | ||
| Concurrency int |
There was a problem hiding this comment.
Please check if data validation tests are already there for all the affected upload and download methods.
Summary
Clamp(NumCPU, 8, 96)DefaultConcurrencyValue()function (keeps oldDefaultConcurrencyconst deprecated for backward compat)AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=trueenv var opt-out to revert to old defaults without code changesPerformance Impact
Benchmarked on Standard_D32s_v6 against a standard storage account, 4 MiB chunk size, 3 iterations per size.
UploadBuffer — OLD (concurrency=5) vs NEW (concurrency=32)
UploadStream — OLD (concurrency=1) vs NEW (concurrency=32)
DownloadBuffer — OLD (concurrency=5) vs NEW (concurrency=32)
Key Observations
Opt-out
Customers who experience issues can revert to the old default without code changes: