-
Notifications
You must be signed in to change notification settings - Fork 59
Description
The MLPerf Storage benchmark (DLIO Upstream) currently uses s3torchconnector for S3-compatible storage backends. However, this library does not support HTTP 307 redirects, which prevents S3-compatible distributed storage systems like NVIDIA AIStore from working out of the box.
Problem
AIStore is fully S3-compatible, but its architecture uses HTTP 307 redirects for load balancing — the proxy node redirects requests to the target node where the data resides. The s3torchconnector used in DLIO does not handle these redirects, causing request failures.
This was also reported upstream in DLIO: argonne-lcf/dlio_benchmark#320
As a workaround, I added native AIStore Python SDK support directly in DLIO and was able to successfully benchmark my cluster. However, this workaround doesn't carry over to the MLPerf Storage benchmark.
Proposal
Two possible approaches:
-
Support S3 redirects: Update the S3 data loading path to handle HTTP 307 redirects. This would benefit any S3-compatible storage system that uses redirect-based load balancing, not just AIStore.
-
Allow custom storage plugins: Provide a plugin mechanism so that storage systems like AIStore can register their own data reader (e.g., using the
aistorePython SDK) and participate in the benchmark natively. This would make the benchmark more extensible and allow more storage vendors to compete on a level playing field.
Context
- AIStore: https://github.com/NVIDIA/aistore
- AIStore Python SDK: https://pypi.org/project/aistore/
- Related DLIO issue: Add Native AIStore Storage Support argonne-lcf/dlio_benchmark#320