Skip to content

fix(sdk)!: reclassify KAS 400 errors — distinguish tamper from misconfiguration#3166

Merged
marythought merged 12 commits intomainfrom
fix/dspx-2606-rewrap-error-classification
Apr 2, 2026
Merged

fix(sdk)!: reclassify KAS 400 errors — distinguish tamper from misconfiguration#3166
marythought merged 12 commits intomainfrom
fix/dspx-2606-rewrap-error-classification

Conversation

@marythought
Copy link
Copy Markdown
Contributor

@marythought marythought commented Mar 17, 2026

Problem

Today, every KAS InvalidArgument (400) error is classified as ErrRewrapBadRequestErrTampered, regardless of the actual cause. That means errors.Is(err, sdk.ErrTampered) returns true for misconfiguration errors like a wrong key ID, unsupported key type, or missing EC-wrapped support — not just actual integrity failures like policy binding mismatches or DEK decryption failures.

This makes ErrTampered unreliable as a signal. SDK consumers cannot distinguish "this TDF was tampered with" from "your KAS setup is wrong," which undermines the tamper detection API.

Approach

Changing gRPC status codes for policy binding failures would leak information about computations involving secret key material. Instead, this PR splits the error signal at the message level:

  1. KAS (service/kas/access/rewrap.go): non-secret 400s now return descriptive messages (e.g. "unsupported key type", "key access object is nil", "ec-wrapped not enabled"). Errors involving secret key material — policy binding, DEK decryption, corrupted policy body, malformed binding encoding — keep the generic "bad request" to avoid leaking secret-derived information.

  2. SDK (sdk/tdf.go): classifies errors based on the message. The substring "desc = bad request" anchored to the gRPC status description field = potential tamper (ErrRewrapBadRequest under ErrTampered); anything else with InvalidArgument = misconfiguration (ErrKASRequestError, not ErrTampered). Per-KAO errors are serialized as plain strings through the proto response (not as gRPC status errors), so substring matching is the only classification mechanism available.

  3. Shared contract: the generic message pattern is defined as kasGenericBadRequest in sdk/tdferrors.go with a cross-reference comment in service/kas/access/rewrap.go so both sides stay in sync. The companion xtest PR (fix(xtest): rename policy binding assertions to reflect tamper classification tests#422) provides runtime enforcement across Go, Java, and JS SDKs.

Error classification

Error source KAS message SDK sentinel errors.Is(ErrTampered)?
Policy binding mismatch "bad request" (generic) ErrRewrapBadRequest Yes
DEK decryption failure "bad request" (generic) ErrRewrapBadRequest Yes
Corrupted policy body "bad request" (generic) ErrRewrapBadRequest Yes
Misconfiguration descriptive (e.g. "unsupported key type") ErrKASRequestError No
Access denied "forbidden" ErrRewrapForbidden No

Breaking changes

1. ErrTampered no longer matches KAS misconfiguration errors

errors.Is(err, sdk.ErrTampered) no longer matches KAS misconfiguration errors (descriptive 400s). Consumers who relied on this to catch all KAS 400s should add a check for ErrKASRequestError:

if errors.Is(err, sdk.ErrTampered) {
    // integrity failure (tamper)
} else if errors.Is(err, sdk.ErrKASRequestError) {
    // client/configuration error (400 or 403)
}

2. ErrRewrapForbidden is now under ErrKASRequestError

ErrRewrapForbidden now wraps ErrKASRequestError, meaning errors.Is(err, ErrKASRequestError) matches both misconfiguration 400s and forbidden 403s. Previously ErrRewrapForbidden was a standalone error. This groups all KAS request-level errors under one sentinel, separate from the tamper hierarchy.

3. Pre-existing descriptive err400 calls reclassified

16 pre-existing err400 calls in the early request validation path (e.g. "invalid request body", "missing request body", "clientPublicKey failure", "bad key for rewrap") already had descriptive messages. Under the old code, ALL InvalidArgument errors were classified as ErrRewrapBadRequest (tamper) regardless of message. Now they are correctly classified as ErrKASRequestError (misconfiguration). These are all client-side validation errors that do not involve secret key material, so this is the correct classification.

4. tdf3Rewrap no longer short-circuits on per-KAO errors

Previously, tdf3Rewrap had an early-return path where verifyRewrapRequests errors (other than errNoValidKeyAccessObjects) would short-circuit with a top-level err400("invalid request"), discarding per-KAO results. Now per-KAO results are always preserved in the results map even when errors occur, so tamper signals from individual KAOs reach the SDK instead of being replaced by a generic top-level error.

Test plan

  • TestGetKasErrorToReturn — all 11 descriptive messages, generic "bad request" → ErrTampered, desc-prefix anchoring, middleware false-positive prevention
  • KAS access tests pass
  • golangci-lint clean
  • Companion xtest PR: fix(xtest): rename policy binding assertions to reflect tamper classification tests#422 — adds tamper-error-split feature flag and strict assertion that policy binding failures produce tamper errors (not KAS request errors)
  • xtest CI passed (Go, Java, JS) — run from xtest branch with tightened assertions against this platform branch

Note: The platform platform-xtest CI job runs xtests from main (opentdf/tests/.github/workflows/xtest.yml@main), which still has the permissive assertions. The strict assertions in opentdf/tests#422 should be merged after this PR lands.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Added cross-platform SDK compatibility testing workflow documentation for contributors
  • Bug Fixes

    • Enhanced KAS error classification to distinguish between request/configuration failures and integrity issues
    • Improved rewrap operation error messages for better debugging while protecting sensitive information
  • Tests

    • Expanded KAS error handling test coverage for additional error scenarios

@marythought marythought requested review from a team as code owners March 17, 2026 15:14
@github-actions github-actions bot added the comp:sdk A software development kit, including library, for client applications and inter-service communicati label Mar 17, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the error handling within the SDK to provide a more accurate classification of KAS-related errors. By introducing a dedicated error type for KAS client/configuration issues and adjusting the parentage of an existing error, it prevents false positives when checking for data tampering and improves the clarity of error propagation.

Highlights

  • Error Hierarchy Refinement: Introduced ErrKASRequestError as a new sentinel error to specifically categorize KAS client and configuration-related failures.
  • Error Re-parenting: Re-parented ErrRewrapBadRequest (KAS 400 errors) to wrap ErrKASRequestError instead of ErrTampered, ensuring a clearer distinction between KAS request issues and data integrity failures.
  • Test Coverage: Added new test assertions to confirm that ErrRewrapBadRequest now correctly matches ErrKASRequestError and explicitly does not match ErrTampered.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • sdk/tdf_test.go
    • Added an assertion to ensure ErrRewrapBadRequest is correctly identified as ErrKASRequestError.
    • Included a negative assertion to verify ErrRewrapBadRequest no longer matches ErrTampered.
  • sdk/tdferrors.go
    • Defined a new sentinel error, ErrKASRequestError, to categorize KAS client/configuration issues.
    • Modified ErrRewrapBadRequest to wrap ErrKASRequestError instead of ErrTampered, changing its error hierarchy.
Activity
  • The TestGetKasErrorToReturn function was updated with new assertions for ErrKASRequestError and NotErrorIs(ErrTampered).
  • The pull request description indicates that CI lint and tests are expected to pass.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Error's path, once tangled, Now unwinds, a clearer view. Tamper's false alarm, gone.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly reclassifies ErrRewrapBadRequest to be a KAS request error instead of a tamper error, which improves error handling clarity. The introduction of ErrKASRequestError is a good step towards better error categorization. I've suggested one minor improvement to also bring ErrRewrapForbidden into this new error hierarchy for consistency.

Comment thread sdk/tdferrors.go Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 161.836503ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 85.826045ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 382.988081ms
Throughput 261.10 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 40.300875285s
Average Latency 401.559941ms
Throughput 124.07 requests/second

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 199.560372ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 99.607246ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 381.224418ms
Throughput 262.31 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 40.284613784s
Average Latency 400.940231ms
Throughput 124.12 requests/second

@github-actions
Copy link
Copy Markdown
Contributor

@marythought
Copy link
Copy Markdown
Contributor Author

Companion xtest PR: opentdf/tests#421

That PR updates test_tdf_with_unbound_policy and test_tdf_with_altered_policy_binding to expect KAS request errors instead of tamper errors, matching the reclassification in this PR. It needs to merge first for xtest CI to pass here.

marythought added a commit to opentdf/tests that referenced this pull request Mar 18, 2026
…421)

## Summary
- Add `assert_kas_request_error` for policy binding failure tests
(`test_tdf_with_unbound_policy`, `test_tdf_with_altered_policy_binding`)
- These tests now correctly expect KAS request errors (400) rather than
tamper/integrity errors
- Backward compatible: accepts both new (`KAS request error`, `rewrap
request 400`) and legacy (`tamper`, `InvalidFileError`) error strings
- Remove unused `"wrap"` case from `assert_tamper_error`

## Context
Companion to opentdf/platform#3166, which reclassifies
`ErrRewrapBadRequest` away from `ErrTampered` and under a new
`ErrKASRequestError`. Policy binding mismatches cause KAS to return 400
Bad Request — a request-level error, not a tamper/integrity issue.

## Test plan
- [ ] Merge this PR first
- [ ] Verify opentdf/platform#3166 xtest jobs pass after this lands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c-r33d
c-r33d previously approved these changes Mar 18, 2026
@marythought marythought enabled auto-merge March 18, 2026 15:33
@marythought marythought disabled auto-merge March 18, 2026 16:58
@marythought marythought marked this pull request as draft March 18, 2026 16:59
@marythought
Copy link
Copy Markdown
Contributor Author

@elizabethhealy identified a case where this change can raise a false negative. Moving back to draft while investigating what's possible in KAS.

gemini-code-assist[bot]

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 191.224808ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 93.325279ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 375.960498ms
Throughput 265.99 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 39.184173995s
Average Latency 389.898784ms
Throughput 127.60 requests/second

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

X-Test Failure Report

bats-test-results
opentdfplatformESYT3C.dockerbuild

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 190.078209ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 97.030557ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 381.964093ms
Throughput 261.80 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 39.364161695s
Average Latency 392.060857ms
Throughput 127.02 requests/second

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 195.114902ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 92.71594ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 401.068868ms
Throughput 249.33 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 39.065901364s
Average Latency 389.21885ms
Throughput 127.99 requests/second

gemini-code-assist[bot]

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown
Contributor

@marythought marythought changed the title fix(sdk): reclassify ErrRewrapBadRequest away from ErrTampered fix(sdk)!: reclassify KAS 400 errors — distinguish tamper from misconfiguration Mar 18, 2026
marythought and others added 8 commits April 2, 2026 09:09
…ic for tamper

Instead of changing the gRPC status code for policy binding failures
(which would leak information about secret-key computations), make
non-secret KAS 400 errors descriptive (e.g. "unsupported key type",
"key access object is nil") so the SDK can distinguish them from the
generic "bad request" that policy binding failures produce.

SDK logic: generic "bad request" from KAS → ErrRewrapBadRequest (under
ErrTampered, potential integrity failure); specific message →
ErrKASRequestError (misconfiguration, not tamper).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
…KAS errors

Document why policy binding and DEK decryption failures intentionally
use generic "bad request" messages — to avoid leaking information about
computations involving secret key material.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
Policy body parse failures and binding encoding errors may indicate
tamper — keep them as generic "bad request" so the SDK classifies
them under ErrTampered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
Extract "bad request" into kasGenericBadRequest constant in the SDK so
both sides of the contract are linked. Add KAS-side comment referencing
the SDK constant. Expand test coverage with subtests for all descriptive
KAS messages and a substring false-match edge case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
…r messages

Address code review feedback:
- Change kasGenericBadRequest from "bad request" to "desc = bad request"
  to anchor the match to the gRPC status description field, avoiding
  false positives from middleware or error wrapping
- Replace dynamic err400(err.Error()) with fixed descriptive message to
  prevent information leakage and accidental tamper classification
- Add all 10 descriptive KAS messages to the test table (was 5)
- Add test for middleware-injected "bad request" without desc prefix
- Update comments to accurately describe substring matching behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
When verifyRewrapRequests returned policyErr, tdf3Rewrap discarded the
per-KAO results (which contained generic "bad request" tamper signals)
and returned err400("invalid request") instead. The SDK classified
"invalid request" as ErrKASRequestError, silently losing the tamper
signal for corrupted policy bodies.

Fix: always store per-KAO results before continuing, regardless of the
error type from verifyRewrapRequests. This ensures the SDK receives the
per-KAO "bad request" entries and classifies them as ErrTampered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
A 400 (misconfiguration) and a 403 (authorization denied) have different
failure modes and different remediation paths. Nesting ErrRewrapForbidden
under ErrKASRequestError repeats the same lumping problem this PR set out
to fix — callers catching ErrKASRequestError to retry with different
config would incorrectly retry on 403s too.

The ticket (DSPX-2606) explicitly specified ErrKASRequestError should be
"separate from ErrRewrapForbidden (authorization)." Restore that
separation: ErrRewrapForbidden is now a standalone sentinel again.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Mary Dickson <mary.dickson@virtru.com>
@marythought marythought dismissed stale reviews from biscoe916, pflynn-virtru, and sujankota via 8ec8182 April 2, 2026 16:10
@marythought marythought force-pushed the fix/dspx-2606-rewrap-error-classification branch from 12ee687 to 8ec8182 Compare April 2, 2026 16:10
@policy-bot-opentdf policy-bot-opentdf bot dismissed stale reviews from sujankota April 2, 2026 16:10

Invalidated by push of 8ec8182

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 193.1099ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 98.451402ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 390.019602ms
Throughput 256.40 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 39.890982271s
Average Latency 396.65753ms
Throughput 125.34 requests/second

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/kas/access/rewrap.go`:
- Around line 796-800: The base64 decode error path around
base64.StdEncoding.Decode(policyBinding, []byte(policyBindingB64Encoded))
currently reports a generic err400("bad request"); change that to return a
descriptive client-facing error (e.g. err400("invalid policy binding encoding"))
so the response matches the logged message and clarifies the failure; update the
call to failedKAORewrap(results, kao, ...) to pass the new descriptive error
while leaving the existing p.Logger.WarnContext(ctx, "invalid policy binding
encoding", slog.Any("error", err)) intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e09cb2d-8fd5-4489-97ec-b51a70a866b6

📥 Commits

Reviewing files that changed from the base of the PR and between 12ee687 and 8ec8182.

📒 Files selected for processing (5)
  • docs/Contributing.md
  • sdk/tdf.go
  • sdk/tdf_test.go
  • sdk/tdferrors.go
  • service/kas/access/rewrap.go

Comment thread service/kas/access/rewrap.go
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

@marythought marythought added this pull request to the merge queue Apr 2, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 2, 2026
@marythought marythought added this pull request to the merge queue Apr 2, 2026
Merged via the queue into main with commit f04a385 Apr 2, 2026
56 of 65 checks passed
@marythought marythought deleted the fix/dspx-2606-rewrap-error-classification branch April 2, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:sdk A software development kit, including library, for client applications and inter-service communicati size/xs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants