[hub] sharded safetensors: 3 → 2 HTTP requests per shard (closes #1979)#2194
Draft
999purple999 wants to merge 1 commit into
Draft
[hub] sharded safetensors: 3 → 2 HTTP requests per shard (closes #1979)#2194999purple999 wants to merge 1 commit into
999purple999 wants to merge 1 commit into
Conversation
Closes huggingface#1979 The old sharded path issues 3 HTTP requests per shard (downloadFile's fileDownloadInfo probe, then WebBlob.slice(0,8) length read, then WebBlob.slice(8, 8+len) header body). For heavily sharded models that fan-out is prohibitive: DeepSeek-Math-V2 (163 shards) fails 100% of the time in upstream benches at 3 x 163 = 489 requests. This patch adds parseSingleFileFast() that issues 2 direct range requests against the resolve URL (bytes=0-7 for the LE header length, then bytes=8-N for the header body), bypassing fileDownloadInfo entirely. The probe metadata (size/etag/xet) is unused for sharded header parsing. Safety: - Auth header forwarded identically - MAX_HEADER_LENGTH cap enforced before issuing the body request - Non-206 responses are refused (a 200 here means the server is streaming the whole multi-GB shard body; we cancel and throw rather than buffer it into RAM) Single-file (non-sharded) entry path is untouched; xet single-file checkpoints still flow through downloadFile's reconstruction logic. Tests: 3 new unit tests with mocked fetch verify (a) exactly 2 requests per shard, (b) 200 rejection, (c) oversized-header rejection. Existing integration tests against real Hub URLs (bigscience/bloom etc.) continue to exercise the sharded path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix for huggingface.js issue #1979 — sharded safetensors metadata 3 → 2 HTTP/shard
Author: Francesco Pernice Botta (
999purple999)Branch:
fix/1979-sharded-safetensors-2hopsIssue: huggingface/huggingface.js#1979
Touched files:
packages/hub/src/lib/parse-safetensors-metadata.ts(114 inserts / 1 delete)packages/hub/src/lib/parse-safetensors-metadata-fast.spec.ts(new, 3 unit tests, fetch mocked)The problem in 6 lines
parseSafetensorsMetadataon a sharded repo does 3 HTTP requests per shard:fileDownloadInfoprobe (Range: bytes=0-0) — to learn size + etag + xet redirectWebBlob.slice(0, 8).arrayBuffer()— read the 8-byte little-endian header lengthWebBlob.slice(8, 8+len).arrayBuffer()— read the JSON header bodyFor heavily sharded models the request fan-out becomes prohibitive:
The size/etag from step 1 is never used when parsing sharded headers — all the
caller needs is the JSON body. The probe is wasted.
The fix
New private helper
parseSingleFileFast(path, params)that issues exactly 2direct range requests against the resolve URL, bypassing
downloadFile/fileDownloadInfo:fetchAllHeaders()is rewired to callparseSingleFileFastinstead ofparseSingleFilefor every shard. The single-file (non-sharded) entry pathis unchanged — there is no benefit there and it preserves xet compatibility
for non-sharded checkpoints.
Safety invariants preserved
Authorization: Bearer …) forwarded identically tofileDownloadInfoMAX_HEADER_LENGTH = 25 MBcap enforced before issuing request Handle streaming for sha computation #2Range: bytes=0-0semantics not needed (we now want 0-7, not 0-0)fetchoverride path preserved (used by proxy / header-rewrite users)URLconstruction mirrorsfileDownloadInfoexactly (bucket vs modelprefix, revision encoding, raw=false)
The 200-response trap
If a misbehaving CDN returns 200 (the entire shard body) instead of 206, the
old
WebBlobslow path would still issue range-tagged sub-requests and behavecorrectly. The new fast path issues a raw Range request and trusts the
server, so we must refuse a 200 response — otherwise we'd buffer a 10+ GB
shard into RAM. The fix calls
response.body?.cancel()and throws.Tests
New unit tests (offline, mocked
fetch) —parse-safetensors-metadata-fast.spec.tsThe first test instruments
fetchand asserts:bytes=0-7Range header on the length-probe requestbytes=8-…Range header on the body-read requestExisting integration tests (
parse-safetensors-metadata.spec.ts)These hit real HF Hub URLs (
bigscience/bloom,Alignment-Lab-AI/ALAI-gemma-7b,hf-internal-testing/sharded-model-metadata-num-parameters). They exercise thesharded path, so they cover this change end-to-end. Run them with:
I have NOT run these locally — they need network + a clean pnpm workspace install
(~5-10 min). The CI on the PR will run them.
How to verify locally
Push instructions (run when ready)
Trade-offs considered
parseSingleFileon non-206? Because a server thatreturns 200 to a Range request is streaming the whole file. Falling back to
WebBlob.slice() would re-issue Range — same outcome but with extra latency.
Failing loudly is correct.
parseSingleFile? xet single-file checkpoints rely onXetBlob's reconstruction-URL logic that lives behindfileDownloadInfo.Touching that is out of scope and risky.
fileExists+ index download? Different issue(xet upload: avoid downloading blobs twice? #1721 / xet upload: parrallelize xorb/shard creation #1704 area, already MERGED via [Hub] Dedupe file entries by xet hash within a shard #2134). Out of scope here.