Skip to content

Integer overflow when uploading large files with chunked upload (affects >2GB vars.h5 uploads) #49

@josegarciamanteiga

Description

@josegarciamanteiga

Bug summary

When uploading large files (such as vars.h5 > 2GB) using the .upload_file_chunked() function, a critical integer overflow can occur when calculating chunk offsets using integers. This breaks for large files, resulting in silent data corruption, failed uploads, or chunks being read from incorrect positions.

Where it happens

In R/client.R, lines 67 and 95 (also refer to usage in both .upload_chunk_server and .upload_chunk_r2):

offset <- chunk_idx * chunk_size

Both chunk_idx and chunk_size are usually as.integer (32-bit signed integers), so this calculation overflows above 2,147,483,647 bytes (about 2GB). For files >2GB, the offsets are incorrect, causing the wrong part of the file to be uploaded for upper chunks.

How to reproduce

  • Attempt to upload a vars.h5 file that is larger than 2GB (e.g. 3GB, 10GB, 45GB) using CyteTypeR to CyteType API
  • Upload will appear to run but will silently send wrong data after the 2GB offset boundary

Impact

  • Files larger than 2GB cannot be reliably uploaded
  • Data can be corrupted/invalid server-side

Suggested fix

  • Make sure offset and related variables are computed using doubles (numeric) and not integers. Just wrap with as.numeric():
    offset <- as.numeric(chunk_idx) * as.numeric(chunk_size)
  • Review upstream code: ensure size, chunk_size, n_chunks are also numeric to avoid integer wraparound

References


Detected using chunk uploads for large files in CyteTypeR. If you need a minimal reproducible example let me know!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions