Skip to content

[Storage] Change default parallelism to CPU-based value#26880

Open
tanyasethi-msft wants to merge 7 commits into
mainfrom
feat/storage/parallelism
Open

[Storage] Change default parallelism to CPU-based value#26880
tanyasethi-msft wants to merge 7 commits into
mainfrom
feat/storage/parallelism

Conversation

@tanyasethi-msft
Copy link
Copy Markdown
Member

@tanyasethi-msft tanyasethi-msft commented May 25, 2026

Summary

  • Changes default concurrency for parallel uploads/downloads from fixed values (5 for buffer/file, 1 for stream) to Clamp(NumCPU, 8, 96)
  • Adds DefaultConcurrencyValue() function (keeps old DefaultConcurrency const deprecated for backward compat)
  • Adds AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=true env var opt-out to revert to old defaults without code changes

Performance Impact

Benchmarked on Standard_D32s_v6 against a standard storage account, 4 MiB chunk size, 3 iterations per size.

UploadBuffer — OLD (concurrency=5) vs NEW (concurrency=32)

Size OLD (5) NEW (32) Improvement
256 MB 94 MB/s 90 MB/s ~same
512 MB 165 MB/s 584 MB/s +254%
1 GiB 176 MB/s 742 MB/s +322%
2 GiB 177 MB/s 897 MB/s +407%
4 GiB 168 MB/s 1,008 MB/s +500%

UploadStream — OLD (concurrency=1) vs NEW (concurrency=32)

Size OLD (1) NEW (32) Improvement
256 MB 87 MB/s 287 MB/s +230%
512 MB 88 MB/s 375 MB/s +326%
1 GiB 90 MB/s 483 MB/s +437%
2 GiB 89 MB/s 503 MB/s +465%
4 GiB 86 MB/s 542 MB/s +530%

DownloadBuffer — OLD (concurrency=5) vs NEW (concurrency=32)

Size OLD (5) NEW (32) Improvement
256 MB 205 MB/s 251 MB/s +22%
512 MB 168 MB/s 459 MB/s +173%
1 GiB 150 MB/s 654 MB/s +336%
2 GiB 155 MB/s 756 MB/s +388%
4 GiB 151 MB/s 822 MB/s +444%

Key Observations

  • UploadStream sees the largest gains — it was previously sequential (concurrency=1), now fully parallel
  • UploadBuffer at 4 GiB: 6x throughput improvement (168 → 1,008 MB/s), saturating the VM's ~12.5 Gbps network
  • DownloadBuffer at 4 GiB: 5.4x throughput improvement (151 → 822 MB/s)
  • Small blobs (<256 MB) see minimal change — single-PUT path is unaffected
  • Numbers scale with VM size; larger VMs with higher network bandwidth (e.g., 100 Gbps) will see proportionally higher absolute throughput

Opt-out

Customers who experience issues can revert to the old default without code changes:

export AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=true

…, azfile, azdatalake

Update default concurrency for uploads and downloads from a fixed value
of 5 (or 1 for stream uploads) to Clamp(NumCPU, 8, 96), matching the
Rust SDK implementation. This improves throughput on modern multi-core
VMs while clamping prevents resource exhaustion on small or very large
machines.

Adds AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=true env var opt-out
to revert to the previous default of 5.
Copilot AI review requested due to automatic review settings May 25, 2026 12:26
@github-actions github-actions Bot added the Storage Storage Service (Queues, Blobs, Files) label May 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates storage transfer defaults so azblob, azfile, and azdatalake choose parallelism from CPU count instead of fixed legacy values, with an environment-variable opt-out for compatibility.

Changes:

  • Adds DefaultConcurrencyValue() in shared storage helpers and applies it to batch transfers and stream uploads.
  • Updates option comments to describe CPU-based concurrency defaults.
  • Adds unit coverage for default concurrency selection and live benchmarks for azblob parallelism.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
sdk/storage/azfile/internal/shared/batch_transfer.go Adds CPU-based default concurrency helper and uses it for batch transfers.
sdk/storage/azfile/internal/shared/batch_transfer_test.go Adds tests for concurrency bounds, CPU matching, determinism, and legacy env var.
sdk/storage/azfile/file/models.go Updates file transfer option docs and stream upload defaulting.
sdk/storage/azdatalake/internal/shared/batch_transfer.go Adds CPU-based default concurrency helper and uses it for batch transfers.
sdk/storage/azdatalake/internal/shared/batch_transfer_test.go Adds tests for concurrency bounds, CPU matching, determinism, and legacy env var.
sdk/storage/azdatalake/file/models.go Updates datalake transfer option docs and stream upload defaulting.
sdk/storage/azblob/internal/shared/shared_test.go Adds tests for shared concurrency default behavior.
sdk/storage/azblob/internal/shared/batch_transfer.go Adds CPU-based default concurrency helper and uses it for batch transfers.
sdk/storage/azblob/blockblob/parallelism_bench_test.go Adds live benchmarks for upload/download parallelism behavior.
sdk/storage/azblob/blockblob/models.go Updates block blob transfer option docs and stream upload defaulting.
sdk/storage/azblob/blockblob/chunkwriting_test.go Adjusts buffer allocation assertion for higher default stream concurrency.
sdk/storage/azblob/blob/models.go Updates blob download option docs for CPU-based defaults.
sdk/storage/azblob/blob/constants.go Deprecates fixed DefaultConcurrency and exposes DefaultConcurrencyValue().

Comment thread sdk/storage/azfile/internal/shared/batch_transfer_test.go Outdated
Comment thread sdk/storage/azdatalake/internal/shared/batch_transfer_test.go Outdated
Comment thread sdk/storage/azblob/internal/shared/shared_test.go
Comment thread sdk/storage/azblob/blockblob/parallelism_bench_test.go Outdated
Comment thread sdk/storage/azblob/blockblob/parallelism_bench_test.go Outdated
Comment thread sdk/storage/azblob/blob/constants.go
Comment thread sdk/storage/azblob/blockblob/models.go Outdated
Comment thread sdk/storage/azfile/file/models.go Outdated
Comment thread sdk/storage/azdatalake/file/models.go Outdated
Comment thread sdk/storage/azblob/blob/models.go Outdated
func (u *UploadStreamOptions) setDefaults() {
if u.Concurrency == 0 {
u.Concurrency = 1
u.Concurrency = int(shared.DefaultConcurrencyValue())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method returns 5 if AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY environment variable is set. Whereas the previous default value for concurrency in UploadStreamOptions is 1.

func (u *UploadStreamOptions) setDefaults() {
if u.Concurrency == 0 {
u.Concurrency = 1
u.Concurrency = shared.DefaultConcurrencyValue()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous, this returns the previous default value as 5 instead of 1

func (u *UploadStreamOptions) setDefaults() {
if u.Concurrency == 0 {
u.Concurrency = 1
u.Concurrency = int(shared.DefaultConcurrencyValue())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous

// DefaultConcurrency is the legacy default number of blocks downloaded or uploaded in parallel.
//
// Deprecated: Use DefaultConcurrencyValue() instead, which returns a value based on CPU core count.
DefaultConcurrency = shared.DefaultConcurrency //nolint:staticcheck // intentional re-export of deprecated const for backward compat
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add changelog entries in all the three packages

Comment on lines +338 to 340
// Each concurrent upload will create a buffer of size BlockSize. The default is based on
// CPU core count (min 8, max 96). Set AZURE_STORAGE_USE_LEGACY_DEFAULT_CONCURRENCY=true to revert to the previous default.
Concurrency int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if data validation tests are already there for all the affected upload and download methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants