Skip to content

fix(s3): fix binary data corruption on upload/download#53

Open
zaru wants to merge 1 commit intovercel-labs:mainfrom
zaru:fix/s3-binary-data-corruption
Open

fix(s3): fix binary data corruption on upload/download#53
zaru wants to merge 1 commit intovercel-labs:mainfrom
zaru:fix/s3-binary-data-corruption

Conversation

@zaru
Copy link
Copy Markdown

@zaru zaru commented Apr 8, 2026

Summary

  • PutObject used c.req.text() which corrupted binary files (docx, xlsx, etc.) via lossy UTF-8 conversion
  • Switch to arrayBuffer() and store body as base64, decode back to Buffer on GET using c.body()
  • Fix ETag and content_length to reflect actual binary byte counts

Reproduce

# Upload Word file
curl -X PUT http://localhost:4000/s3/emulate-default/sample.docx \
  -H "Authorization: Bearer token" \
  -H "Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
  -H "x-amz-meta-author: test" \
  --data-binary @"./sample.docx"

# Download Word file
curl http://localhost:4000/s3/emulate-default/sample.docx \
  -H "Authorization: Bearer token" -O

# Open Word file, but it's broken
open sample.docx

Changes

  • c.req.text()c.req.arrayBuffer() + base64 encoding for storage
  • c.text()c.body() + base64 decoding for retrieval
  • Widen md5() type signature to accept string | Buffer
  • Add binary data round-trip test

I've confirmed that base64 encoding resolves the binary corruption issue, but I'm not sure if this is the best approach for the project. If there's a more suitable way to handle binary data storage, I'd appreciate any feedback or suggestions. Happy to revise the implementation accordingly.

- PutObject used `c.req.text()` which corrupted binary files (docx, xlsx, etc.) via lossy UTF-8 conversion
- Switch to `arrayBuffer()` and store body as base64, decode back to Buffer on GET using `c.body()`
- Fix ETag and content_length to reflect actual binary byte counts

## Changes

- `c.req.text()` → `c.req.arrayBuffer()` + base64 encoding for storage
- `c.text()` → `c.body()` + base64 decoding for retrieval
- Widen `md5()` type signature to accept `string | Buffer`
- Add binary data round-trip test
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 8, 2026

@zaru is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant