Skip to content

feat: add user metadata support for RLS policies#825

Open
TylerHillery wants to merge 22 commits intomasterfrom
tyler/STORAGE-258/user-metadata-in-rls
Open

feat: add user metadata support for RLS policies#825
TylerHillery wants to merge 22 commits intomasterfrom
tyler/STORAGE-258/user-metadata-in-rls

Conversation

@TylerHillery
Copy link
Contributor

@TylerHillery TylerHillery commented Jan 23, 2026

What kind of change does this PR introduce?

resolves #757 by adding support for referencing user_metadata field in RLS policies for inserts

What is the current behavior?

Currently user_metadata is not passed into canUpload checks, so RLS policies can't reference it during file uploads.

What is the new behavior?

Now user_metadata is passed through all upload paths (standard HTTP, S3, TUS, multipart) so RLS policies can use it.

Additional context

  • The main change I made was to canUpload to accept userMetatdata. I updated all callers to pass in userMetadata. Because upload gets called in prepareUpload which gets called in upload I also updated all upload calls to ensure userMetadata is passed through. I found it difficult to ensure I caught every cause though, e.g. it was very easy to miss uploadNewObject needed to be updated in the uploadFromRequest. I'm open to feedback to see if there is a more robust solution we could implement that could be caught with typescript types.
  • The part I found most confusing to update was the canUpload checks for multipart uploads. It required changing the order of certain operations around so that user_metadata was available. Wasn't entirely sure if these changes were okay.
  • Added some context as to why I moved userMetadata from the FileUpload interface to the UploadRequest but wasn't sure if this was the right approach so would appreciate some feedback

@coveralls
Copy link

coveralls commented Jan 23, 2026

Pull Request Test Coverage Report for Build 21612994975

Details

  • 135 of 139 (97.12%) changed or added relevant lines in 9 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 75.596%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/http/routes/object/getSignedUploadURL.ts 18 20 90.0%
src/http/routes/tus/lifecycle.ts 22 24 91.67%
Files with Coverage Reduction New Missed Lines %
src/storage/protocols/s3/s3-handler.ts 1 83.59%
Totals Coverage Status
Change from base Build 21589555306: 0.04%
Covered Lines: 26008
Relevant Lines: 34122

💛 - Coveralls

@TylerHillery TylerHillery marked this pull request as ready for review January 23, 2026 23:27
@TylerHillery TylerHillery requested a review from a team January 23, 2026 23:27
@fenos
Copy link
Contributor

fenos commented Jan 26, 2026

@TylerHillery awesome! Can we also pass in the metadata into the canUpload?
User metadata is metadata passed by the user where metadata is the file metadata such as size, content-type etc...

@TylerHillery
Copy link
Contributor Author

@TylerHillery awesome! Can we also pass in the metadata into the canUpload? User metadata is metadata passed by the user where metadata is the file metadata such as size, content-type etc...

I thought we couldn't do this because we don't get the objectMetadata until the object has been uploaded. We check canUpload before uploading to object storage to test RLS. Can you confirm if this is the case?

We could switch the order but then we have to ensure to delete the object if user can't upload and we would be do the "expensive" part uploading when we didn't have to. Much quicker to test RLS first and skip uploading if we don't have to.

Here is the upload method on the Uploader class with notes:

  /**
   * Extracts file information from the request and upload the buffer
   * to the remote storage
   * @param request
   * @param options
   */
  async upload(request: UploadRequest) {
   // NOTE: canUpload() gets called inside prepare upload
    const version = await this.prepareUpload(request)

    try {
      const file = request.file

      const s3Key = this.location.getKeyLocation({
        tenantId: this.db.tenantId,
        bucketId: request.bucketId,
        objectName: request.objectName,
      })
        // NOTE: we don't get object metadata until here
      const objectMetadata = await this.backend.uploadObject(
        storageS3Bucket,
        s3Key,
        version,
        file.body,
        file.mimeType,
        file.cacheControl,
        request.signal
      )

      if (request.file.xRobotsTag) {
        objectMetadata.xRobotsTag = request.file.xRobotsTag
      }

      if (file.isTruncated()) {
        throw ERRORS.EntityTooLarge()
      }

      return this.completeUpload({
        ...request,
        version,
        objectMetadata: objectMetadata,
        userMetadata: { ...request.userMetadata },
      })
    } catch (e) {
      await ObjectAdminDelete.send({
        name: request.objectName,
        bucketId: request.bucketId,
        tenant: this.db.tenant(),
        version: version,
        reqId: this.db.reqId,
      })
      throw e
    }
  }

@fenos
Copy link
Contributor

fenos commented Jan 27, 2026

I thought we couldn't do this because we don't get the objectMetadata until the object has been uploaded.
You are correct on this, but potentially we could rely on the:

  • content-length of the request for the file size
  • content-type for the mimeType

Other fields like etag, lastModified, it's fine if they are empty on the first RLS check

.from(bucketName)
.signUploadObjectUrl(objectName, urlPath as string, uploadSignedUrlExpirationTime, owner, {
upsert: request.headers['x-upsert'] === 'true',
userMetadata: userMetadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't actually support userMetadata on upload signed urls (which we should) so I'm glad you found this.

I believe we need to store the userMetadata in the returned JWT payload; otherwise, this information will be lost during the signedUpload request. We need to be careful, though, if we decide to store the metadata in the JWT, because it can be abused with large payloads. We can for now restrict the amount of data that we can store in the JWT to something like 1MB i think we already have a constant MAX_CUSTOM_METADATA_SIZE, which can be reused (re-refactored as a validation function, potentially in limits.ts)

But we also need to add the second part (on the upload route) to store the userMetadata received from the JWT. This way, the metadata can't be tempered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do a follow up PR to handle the upload part.

@TylerHillery TylerHillery force-pushed the tyler/STORAGE-258/user-metadata-in-rls branch from 57b96fb to 05a322a Compare February 3, 2026 01:24
@TylerHillery
Copy link
Contributor Author

I thought we couldn't do this because we don't get the objectMetadata until the object has been uploaded.
You are correct on this, but potentially we could rely on the:

  • content-length of the request for the file size
  • content-type for the mimeType

Other fields like etag, lastModified, it's fine if they are empty on the first RLS check

I have updated canUpload to accept a metadata parameter that gets passed to the RLS check. Only content-length and content-typ get passed through.

Multipart uploads required a little more work as the metadata field was currently present on in s3_multipart_uploads so I had to create migration.

@TylerHillery TylerHillery requested a review from fenos February 3, 2026 01:46
bucketId: string
objectName: string
file: FileUpload
userMetadata: Record<string, unknown> | undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we diverge from optionality? userMetadata?: Record<string, unknown>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While implementing this change I found it very easy to forget to pass userMetadata somewhere when it should be passed so I thought I make it required but allow undefined if it's not needed.

I called this out in the PR as wasn't sure if there was a better way and wasn't 100% confident about this solution. If you think optionality is better here I am happy to change back.

calls to ensure userMetadata is passed through. I found it difficult to ensure I caught every cause though, e.g. it was very easy to miss uploadNewObject needed to be updated in the uploadFromRequest. I'm open to feedback to see if there is a more robust solution we could implement that could be caught with typescript type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but I would go with optional since more idiomatic. Maybe for cleanup afterwards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted back to optional

owner_id: owner,
user_metadata: metadata,
user_metadata: userMetadata,
metadata: metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need a guard rule for new column similar to user_metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right. I added a guard rule but it feels a bit hacky. The reason I didn't add it to rules in normalizedColumns is because metadata column does exist in objects table. It's my understanding any metadata column would get remove regardless of the table it's selecting from.

So instead I checked if latest migration was >= so s3-multipart-uploads-metadata and added it to the list of columns to select.

I almost feel normalizedColumns should be table dependent. This might be better in a follow up PR thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree, also see #825 (comment)

@TylerHillery TylerHillery force-pushed the tyler/STORAGE-258/user-metadata-in-rls branch from 05a322a to 5d4edd0 Compare February 10, 2026 20:25
@TylerHillery TylerHillery force-pushed the tyler/STORAGE-258/user-metadata-in-rls branch from 5d4edd0 to 8c316f6 Compare February 23, 2026 19:06
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a nullable jsonb metadata column to storage.s3_multipart_uploads and updates code paths so client-provided user metadata and content metadata (mimetype, contentLength) are accepted, persisted, and propagated into upload authorization checks. Changes touch HTTP routes (signed upload headers), TUS lifecycle, uploader interfaces, storage handlers, DB adapter/knex, multipart schemas, and tests; the DB migration mapping is updated and tests for RLS and migration compatibility were added.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Route as HTTP Route / TUS Entry
    participant Uploader
    participant Storage
    participant DB
    participant RLS

    Client->>HTTP_Route: Upload request (headers: x-metadata, content-type, content-length)
    HTTP_Route->>Uploader: call canUpload/prepareUpload (includes userMetadata, metadata)
    Uploader->>Storage: create/prepare multipart (passes userMetadata, metadata)
    Storage->>DB: insert/find multipart row (user_metadata, metadata)
    DB->>RLS: evaluate policy with user_metadata & metadata
    RLS-->>DB: allow/deny
    DB-->>Storage: result
    alt allowed
        Storage->>Uploader: success (tokens/parts)
        Uploader-->>Client: signed URL / upload acceptance
    else denied
        Uploader-->>Client: error (403/400)
    end
Loading

Assessment against linked issues

Objective Addressed Explanation
Enable user_metadata access in RLS policies during uploads [#757]
Propagate user_metadata through upload operations to canUpload validation [#757]
Support user_metadata in multipart upload workflows [#757]
Support content metadata (mimetype, contentLength) in RLS policy checks [#757]

Out-of-scope changes

Code Change Explanation
New multipart resilience test and retry behavior (src/test/s3-protocol.test.ts) Focuses on in_progress_size mutation and retry semantics, not required by the RLS/user_metadata objective.
Extensive test harness additions including tus-js-client helpers (src/test/rls.test.ts) Adds broad test scenarios and client tooling to exercise flows; these expand coverage but are outside the single-issue objective.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

properties: {
'x-upsert': { type: 'string' },
'content-type': { type: 'string' },
'content-length': { type: 'string' },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need x-metadata as well since we try to read below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, added

const customMd = request.headers['x-metadata']

if (typeof customMd === 'string') {
userMetadata = parseUserMetadata(customMd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parser returns Record<string, string> by casting due to its expectation (and S3 compatibility) but there is no validation. Values could be anything.

This is an existing helper, feel free to add a TODO to be addressed later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into supabase-js and s3 part of this, type is wrong but handling is correct and we return x-amz-meta-missing header for non-string/non-printable

Copy link
Member

@ferhatelmas ferhatelmas Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to note is that when it's set when content-type is multipart/form-data, we check/limit its size but not when it comes from a header.

However, the value is 1kb and a header can definitely be larger than that as well saw value wrong, it's 1mb so this is a valid assumption. We could reuse the code, instead of two parsers as a cleaup but not in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added TODO


const uploader = new Uploader(this.storage.backend, this.storage.db, this.storage.location)

const multipart = await this.shouldAllowPartUpload(UploadId, ContentLength, maxFileSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ordering can cause issues for valid uploads because it mutates the state.

For example (numbers are arbitrary)

  1. in progress 50mb and new part is 5mb
  2. in progress becomes 55mb
  3. RLS fails
  4. next request in progress becomes 60mb
  5. RLS checks but size fails because it was expecting at most 55mb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go catch, changed the order to have canUpload called before shouldAllowPartUpload. What I don't like about this is that this adds an extra call to findMultipartUpload as this is also called in shouldAllowPartUpload I thought about moving canUpload within in shouldAllowPartUpload but it uses asSuperUser before starting the transaction which would bypass RLS.

Can you confirm my understanding? Is there anyway to do this that I'm missing that doesn't involve extra findMultipartUpload call?

const multipart = await this.storage.db
.asSuperUser()
.findMultipartUpload(UploadId, 'id,version')
.findMultipartUpload(UploadId, 'id,version,user_metadata,metadata')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the column guard for these reads similar to create as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the guard to the findMultipartUpload with a TODO comment to refactor once we change normalizeColumns to be table aware.

objectName: destinationKey,
owner,
isUpsert: upsert,
userMetadata: userMetadata,
Copy link
Member

@ferhatelmas ferhatelmas Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to metadata, aren't we supposed to use origin.user_metadata if copyMetadata is false because it's the written value eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we are, changed it to the same logic we use for metadata

Copy link
Contributor Author

@TylerHillery TylerHillery Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back, using the same logic breaks some tests for userMetadata we don't want to merge the two. Either use all of the origin if copy or use the new destination metadata

@ferhatelmas
Copy link
Member

Can we also add more tests please?

  1. copy with copyMetadata=true uses correct user metadata for RLS check
  2. multipart s3 RLS with metadata
  3. multipart progress accounting on auth failure
  4. signed upload RLS with metadata
  5. migration compatibility

metadata checked at sign time but mutable at upload time (to be addressed separately by me when this is merged)

@TylerHillery TylerHillery force-pushed the tyler/STORAGE-258/user-metadata-in-rls branch from 8c316f6 to 6ea5123 Compare February 27, 2026 00:28
@TylerHillery
Copy link
Contributor Author

TylerHillery commented Feb 27, 2026

Can we also add more tests please?
copy with copyMetadata=true uses correct user metadata for RLS check
multipart s3 RLS with metadata
multipart progress accounting on auth failure
signed upload RLS with metadata
migration compatibility

Added all the above except for migration compatibility what did you have in mind for this? In CI we do ensure all migrations run and are deleted and rerun to ensure they're idempotent. Does this satisfy the migration compatibility?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/storage/object.ts (1)

805-817: ⚠️ Potential issue | 🔴 Critical

Signed-upload RLS validation is bypassable because policy-checked metadata is not bound to the JWT.

At lines 805–812, canUpload evaluates options.userMetadata and options.metadata against row-level security policies, but at lines 815–817 the generated token only contains { owner, url, upsert }. The downstream upload handlers (uploadSignedObject and tus lifecycle) extract only those three fields and never enforce that request metadata matches the values validated at signing time. An attacker can obtain a signed URL with policy-compliant metadata, then submit different metadata during upload, bypassing the RLS check.

Add userMetadata and metadata to SignedUploadToken, include them in the signing payload, and enforce exact match (or reject caller-provided values) during upload execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/object.ts` around lines 805 - 817, canUpload currently validates
options.userMetadata and options.metadata against RLS but the signed token only
includes {owner, url, upsert}, allowing an attacker to change metadata during
upload; update the SignedUploadToken type to include userMetadata and metadata,
include those fields in the payload passed to signJWT (in the code path that
calls getJwtSecret and signJWT), and then modify the upload handlers
(uploadSignedObject and the tus lifecycle handler) to read the token’s
userMetadata/metadata and enforce an exact match (or reject any caller-provided
metadata) before accepting the upload; use the existing canUpload, getJwtSecret,
signJWT, SignedUploadToken, uploadSignedObject, and tus lifecycle handlers as
the touchpoints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/storage/object.ts`:
- Around line 805-817: canUpload currently validates options.userMetadata and
options.metadata against RLS but the signed token only includes {owner, url,
upsert}, allowing an attacker to change metadata during upload; update the
SignedUploadToken type to include userMetadata and metadata, include those
fields in the payload passed to signJWT (in the code path that calls
getJwtSecret and signJWT), and then modify the upload handlers
(uploadSignedObject and the tus lifecycle handler) to read the token’s
userMetadata/metadata and enforce an exact match (or reject any caller-provided
metadata) before accepting the upload; use the existing canUpload, getJwtSecret,
signJWT, SignedUploadToken, uploadSignedObject, and tus lifecycle handlers as
the touchpoints.

ℹ️ Review info

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea5123 and a2f4938.

📒 Files selected for processing (1)
  • src/storage/object.ts

@TylerHillery TylerHillery force-pushed the tyler/STORAGE-258/user-metadata-in-rls branch from f184734 to b4676ca Compare February 28, 2026 14:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/storage/object.ts (1)

805-819: ⚠️ Potential issue | 🔴 Critical

Metadata passed to authorization check is not bound to signed upload tokens.

At line 805, canUpload() validates the operation using options.userMetadata and options.metadata. However, the JWT signed at line 816 omits both fields. During upload execution in uploadSignedObject.ts, uploadFromRequest() extracts metadata from the current request, not from the token—allowing a user to sign a URL with one metadata set and upload with a different one, bypassing the original authorization check.

Fix: Include metadata fields in the signed JWT payload:

Suggested fix
-    const token = await signJWT(
-      { owner, url, upsert: Boolean(options?.upsert) },
+    const token = await signJWT(
+      {
+        owner,
+        url,
+        upsert: Boolean(options?.upsert),
+        userMetadata: options?.userMetadata,
+        metadata: options?.metadata,
+      },
       urlSigningKey,
       expiresIn
     )

Also validate that request metadata at upload time matches the token-bound metadata (or re-run canUpload() with the same metadata set under the original authorization context).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/object.ts` around lines 805 - 819, The authorization metadata
(userMetadata and metadata) passed to canUpload is not included in the signed
token: update the signJWT payload in the code that calls signJWT to include
options?.userMetadata and options?.metadata (alongside owner, url, upsert) so
the signed token binds those values; then in uploadSignedObject.ts
(uploadFromRequest) verify the incoming request's metadata matches the
token-bound metadata or re-run canUpload with the same metadata under the
original auth context (use getJwtSecret/signJWT token payload and canUpload) to
prevent replacing metadata at upload time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/storage/object.ts`:
- Around line 805-819: The authorization metadata (userMetadata and metadata)
passed to canUpload is not included in the signed token: update the signJWT
payload in the code that calls signJWT to include options?.userMetadata and
options?.metadata (alongside owner, url, upsert) so the signed token binds those
values; then in uploadSignedObject.ts (uploadFromRequest) verify the incoming
request's metadata matches the token-bound metadata or re-run canUpload with the
same metadata under the original auth context (use getJwtSecret/signJWT token
payload and canUpload) to prevent replacing metadata at upload time.

ℹ️ Review info

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f184734 and b4676ca.

📒 Files selected for processing (16)
  • migrations/tenant/0057-s3-multipart-uploads-metadata.sql
  • src/http/routes/object/getSignedUploadURL.ts
  • src/http/routes/tus/index.ts
  • src/http/routes/tus/lifecycle.ts
  • src/internal/database/migrations/types.ts
  • src/storage/database/adapter.ts
  • src/storage/database/knex.ts
  • src/storage/object.ts
  • src/storage/protocols/s3/s3-handler.ts
  • src/storage/schemas/multipart.ts
  • src/storage/uploader.ts
  • src/test/cdn.test.ts
  • src/test/rls.test.ts
  • src/test/rls_tests.yaml
  • src/test/s3-protocol.test.ts
  • src/test/scanner.test.ts
💤 Files with no reviewable changes (1)
  • src/test/scanner.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/storage/database/adapter.ts
  • src/http/routes/object/getSignedUploadURL.ts
  • src/test/rls_tests.yaml
  • src/storage/schemas/multipart.ts

@TylerHillery TylerHillery force-pushed the tyler/STORAGE-258/user-metadata-in-rls branch from f33de9d to 2dd0938 Compare February 28, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't use user_metadata in RLS policies when uploading a file

4 participants