Bug summary
When uploading large files (such as vars.h5 > 2GB) using the .upload_file_chunked() function, a critical integer overflow can occur when calculating chunk offsets using integers. This breaks for large files, resulting in silent data corruption, failed uploads, or chunks being read from incorrect positions.
Where it happens
In R/client.R, lines 67 and 95 (also refer to usage in both .upload_chunk_server and .upload_chunk_r2):
offset <- chunk_idx * chunk_size
Both chunk_idx and chunk_size are usually as.integer (32-bit signed integers), so this calculation overflows above 2,147,483,647 bytes (about 2GB). For files >2GB, the offsets are incorrect, causing the wrong part of the file to be uploaded for upper chunks.
How to reproduce
- Attempt to upload a vars.h5 file that is larger than 2GB (e.g. 3GB, 10GB, 45GB) using CyteTypeR to CyteType API
- Upload will appear to run but will silently send wrong data after the 2GB offset boundary
Impact
- Files larger than 2GB cannot be reliably uploaded
- Data can be corrupted/invalid server-side
Suggested fix
- Make sure
offset and related variables are computed using doubles (numeric) and not integers. Just wrap with as.numeric():
offset <- as.numeric(chunk_idx) * as.numeric(chunk_size)
- Review upstream code: ensure
size, chunk_size, n_chunks are also numeric to avoid integer wraparound
References
Detected using chunk uploads for large files in CyteTypeR. If you need a minimal reproducible example let me know!
Bug summary
When uploading large files (such as vars.h5 > 2GB) using the
.upload_file_chunked()function, a critical integer overflow can occur when calculating chunk offsets using integers. This breaks for large files, resulting in silent data corruption, failed uploads, or chunks being read from incorrect positions.Where it happens
In
R/client.R, lines 67 and 95 (also refer to usage in both.upload_chunk_serverand.upload_chunk_r2):Both
chunk_idxandchunk_sizeare usuallyas.integer(32-bit signed integers), so this calculation overflows above 2,147,483,647 bytes (about 2GB). For files >2GB, the offsets are incorrect, causing the wrong part of the file to be uploaded for upper chunks.How to reproduce
Impact
Suggested fix
offsetand related variables are computed using doubles (numeric) and not integers. Just wrap withas.numeric():size,chunk_size,n_chunksare also numeric to avoid integer wraparoundReferences
client.RsourceDetected using chunk uploads for large files in CyteTypeR. If you need a minimal reproducible example let me know!