Skip to content

fix: s3 update and find fail with short bucket names#250

Merged
kferrone merged 12 commits intomainfrom
fix/s3-update
Apr 24, 2026
Merged

fix: s3 update and find fail with short bucket names#250
kferrone merged 12 commits intomainfrom
fix/s3-update

Conversation

@amaechiabuah
Copy link
Copy Markdown
Collaborator

Describe Changes

The S3 backend API's GET endpoint passes the raw bucket name directly to AWS SDK via LocateBucket(), which cannot resolve short names (e.g. test-bucket) to full AWS names (e.g. duploservices-amaechi-test-bucket-182680712604). The PUT endpoint additionally requires the lowercase s3bucket route and the full bucket name in both the URL and request body.

  • find: Replaced direct GET lookup with list-and-match that resolves short names by checking if the full bucket name starts with prefixed_name(short_name)
  • update: New override that resolves the full name via find, sets it in the body, and PUTs to the lowercase aws/s3bucket/{full_name} endpoint

Previously, apply only worked because find failed with a short name and fell through to the create path. Now find, update, and apply all work correctly with short names.

Link to Issues

https://app.clickup.com/t/8655600/DUPLO-41903

PR Review Checklist

  • Thoroughly reviewed on local machine.
  • Have you added any tests
  • Make sure to note changes in Changelog

@zafarabbas
Copy link
Copy Markdown
Contributor

Task linked: DUPLO-41903 s3: update fails

@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Fix S3 update and find operations with short bucket names

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Fixed s3 find to resolve short bucket names via list-and-match
• Fixed s3 update to use full bucket name and lowercase endpoint
• Added error handling with proper exception types
• Enables apply to work correctly with short bucket names
Diagram
flowchart LR
  A["Short bucket name<br/>e.g. mybucket"] -->|find method| B["List all buckets"]
  B -->|match prefix| C["Resolve full name<br/>e.g. duploservices-tenant-mybucket-123456"]
  C -->|update method| D["PUT to lowercase<br/>aws/s3bucket endpoint"]
  D -->|with full name| E["Updated bucket"]
Loading

Grey Divider

File Changes

1. src/duplo_resource/s3.py 🐞 Bug fix +52/-4

Implement short bucket name resolution and update endpoint

• Added imports for DuploError and DuploNotFound exception handling
• Replaced find method to use list-and-match approach for resolving short bucket names
• Added new update method that resolves full bucket name and uses lowercase s3bucket endpoint
• Updated docstrings to reflect support for both short and full bucket names
• Changed exception type from DuploError to DuploNotFound in find method

src/duplo_resource/s3.py


2. CHANGELOG.md 📝 Documentation +1/-0

Document S3 short bucket name fix

• Added entry documenting the fix for s3 update and s3 find with short bucket names
• Noted the resolution mechanism using list-and-match and lowercase endpoint

CHANGELOG.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Apr 2, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0)

Grey Divider


Action required

1. update() assumes dict body 📘 Rule violation ☼ Reliability
Description
update() calls self.name_from_body(body) and assigns body["Name"] without validating that
body is a dict and contains the required Name key. This can raise TypeError/KeyError instead
of a clear domain error when the request body is missing, non-object YAML, or lacks required fields.
Code

src/duplo_resource/s3.py[R67-75]

+    if not name and not body:
+      raise DuploError("Name is required when body is not provided")
+    name = name if name else self.name_from_body(body)
+    current = self.find(name)
+    full_name = self.name_from_body(current)
+    if body:
+      body["Name"] = full_name
+    else:
+      body = current
Evidence
PR Compliance ID 1 requires validating optional request bodies (type/shape and required keys) before
accessing or mutating them. The new update() implementation derives name from body and writes
into body without any dict/type/key checks, while name_from_body() directly indexes
body["Name"].

src/duplo_resource/s3.py[67-75]
src/duplo_resource/s3.py[13-14]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`update()` uses `body` as if it is always a dict with a `Name` key (`self.name_from_body(body)` and `body["Name"] = ...`). If `body` is missing, non-dict YAML (e.g., a list/string), or missing `Name`, this will crash with `TypeError`/`KeyError` instead of raising a clear domain error.
## Issue Context
The compliance requirement expects explicit validation of optional request bodies and body-derived fields, raising a clear domain-specific error (e.g., `DuploError`) rather than leaking Python exceptions.
## Fix Focus Areas
- src/duplo_resource/s3.py[67-75]
- src/duplo_resource/s3.py[13-14]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. update() unguards response.json() 📘 Rule violation ☼ Reliability
Description
The new update() implementation always calls response.json() after the PUT request without
checking for HTTP 204 or empty response content. This can cause JSON decode errors on successful
empty responses.
Code

src/duplo_resource/s3.py[R78-82]

+    response = self.client.put(
+      self.endpoint(full_name).replace("aws/s3Bucket", "aws/s3bucket"),
+      body
+    )
return response.json()
Evidence
PR Compliance ID 2 requires guarding against 204/empty responses before JSON parsing. The new code
returns response.json() unconditionally, and the underlying client treats any 2xx (including 204)
as success without ensuring a JSON body exists.

src/duplo_resource/s3.py[78-82]
src/duplocloud/client.py[148-150]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`update()` calls `response.json()` even when the HTTP response has no body (e.g., 204 No Content), which can raise a JSON decode error despite the request succeeding.
## Issue Context
The shared HTTP client returns the raw `requests.Response` for any 2xx status code (including 204). Callers must check `response.status_code` and/or `response.content` before invoking `response.json()`.
## Fix Focus Areas
- src/duplo_resource/s3.py[78-82]
- src/duplocloud/client.py[148-150]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Wrong bucket selected🐞 Bug ≡ Correctness
Description
DuploS3.find() matches buckets by full.startswith(prefixed_name(short)), which can return the wrong
bucket when another bucket’s name shares that prefix (e.g., "...-mybucket2-..." matches
"...-mybucket"). This can cause update/apply to operate on an unintended bucket because the first
list match is returned without ambiguity detection.
Code

src/duplo_resource/s3.py[R37-41]

+    prefix = self.prefixed_name(name)
+    for bucket in self.list():
+      full = bucket["Name"]
+      if full == name or full.startswith(prefix):
+        return bucket
Evidence
prefixed_name() only prepends the tenant prefix and does not add a delimiter after the provided
short name; therefore startswith(prefix) is ambiguous when bucket names share the same leading
characters. find() returns the first match from list() with no check for multiple matches, making
selection order-dependent.

src/duplo_resource/s3.py[37-42]
src/duplocloud/commander.py[69-90]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuploS3.find()` resolves short bucket names by `full.startswith(prefix)` where `prefix = prefixed_name(short)`. Because `prefixed_name()` simply prepends the tenant prefix and does not enforce a boundary after the short name, buckets like `...-mybucket2-...` will match `...-mybucket`, and `find()` will return whichever appears first in the list.
### Issue Context
This is used to resolve short names for subsequent operations (e.g., update/apply). Returning the wrong bucket is a correctness issue.
### Fix Focus Areas
- src/duplo_resource/s3.py[37-42]
- src/duplocloud/commander.py[69-90]
### Suggested fix
- Tighten matching to require a boundary after the short name, e.g.:
- accept exact full-name match: `full == name`
- accept exact prefixed-short match: `full == prefix`
- accept prefixed-short + suffix: `full.startswith(prefix + "-")`
- Collect all matches; if `len(matches) > 1`, raise a `DuploError` indicating an ambiguous short name rather than returning the first result.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Stale bucket list cache 🐞 Bug ☼ Reliability
Description
DuploS3.find() depends on list(), and list() uses the cached DuploAPI.get(), so find() can return
stale results for up to the GET cache TTL within the same long-lived process. This can yield
transient false "not found" errors (or miss newly-created/deleted buckets) until the cache expires.
Code

src/duplo_resource/s3.py[R37-42]

+    prefix = self.prefixed_name(name)
+    for bucket in self.list():
+      full = bucket["Name"]
+      if full == name or full.startswith(prefix):
+        return bucket
+    raise DuploNotFound(name, "s3")
Evidence
DuploAPI.get() is cached via a TTLCache, and DuploResourceV3.list() is implemented using
client.get(endpoint()). Since DuploS3.find() now relies on list(), it inherits this caching behavior
and may observe stale list data within a single process.

src/duplocloud/client.py[20-104]
src/duplocloud/resource.py[179-192]
src/duplo_resource/s3.py[37-42]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuploS3.find()` now calls `self.list()`, which uses `DuploAPI.get()` under the hood. `DuploAPI.get()` is cached with a TTL, so `find()` can observe stale bucket lists within the same process.
### Issue Context
This is most visible in long-lived processes or integration tests that reuse a single `DuploCtl`/client instance across operations.
### Fix Focus Areas
- src/duplo_resource/s3.py[37-42]
- src/duplocloud/client.py[20-104]
- src/duplocloud/resource.py[179-192]
### Suggested fix
Implement an uncached path for the list call used by `s3.find()`, for example:
- Add a `get_uncached()` method to `DuploAPI` (delegating directly to `_request`) and use it in an overridden `DuploS3.list()` or directly in `DuploS3.find()`.
- Alternatively, override `DuploS3.list()` to call `self.client._request("GET", self.endpoint())` (if acceptable) so only S3 bucket listing bypasses cache.
Avoid globally disabling the GET cache unless that is intended behavior.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

5. Duplicate find in apply 🐞 Bug ➹ Performance
Description
apply() already calls find() before invoking update(), but DuploS3.update() calls find() again,
causing two full list scans per apply on existing buckets. This increases latency proportional to
bucket count.
Code

src/duplo_resource/s3.py[R70-71]

+    current = self.find(name)
+    full_name = self.name_from_body(current)
Evidence
DuploResource.apply() does self.find(name) then self.update(...), and the new DuploS3.update()
performs current = self.find(name) again, duplicating the list-based lookup work.

src/duplocloud/resource.py[339-344]
src/duplo_resource/s3.py[67-76]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
S3 apply/update path performs redundant lookups: `apply()` calls `find()` and then `update()`, and `DuploS3.update()` calls `find()` again to resolve the full bucket name.
### Issue Context
This is a performance-only issue, but the new `find()` is list-based and may be expensive on tenants with many buckets.
### Fix Focus Areas
- src/duplocloud/resource.py[339-344]
- src/duplo_resource/s3.py[67-81]
### Suggested fix
Override `DuploS3.apply()` so it resolves the bucket once and then performs the PUT without re-calling `find()` (e.g., an internal helper like `_put_full_name(full_name, body, patches)`). Keep CLI surface area unchanged; just avoid the second list scan when apply already proved existence.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@duploctl
Copy link
Copy Markdown
Contributor

duploctl Bot commented Apr 2, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3757 1433 38% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
src/duplo_resource/s3.py 0% 🟢
TOTAL 0% 🟢

updated for commit: 7b21a4a by action🐍

Comment thread src/duplo_resource/s3.py
Comment thread src/duplo_resource/s3.py
Comment thread src/duplo_resource/s3.py Outdated
@amaechiabuah amaechiabuah requested a review from kferrone April 6, 2026 12:50
Comment thread src/duplo_resource/s3.py Outdated
DuploNotFound: If the bucket could not be found.
"""
prefix = self.prefixed_name(name)
for bucket in self.list():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am usually against using list for GET purposes as it does not scale well.

For this case, however, I think its actually fine because of the lower default bucket quota per AWS account (I think the limit is 100?)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure that using the LIST to GET is a good idea.
There is an endpoint to get a single bucket and this change removes that. Now we would never be able to use this cli to hit that endpoint.

I think we can think about this a bit more and come up with a better solution even if it means sacrificing some bells and whistles to make the logic more "dumb."

Comment thread src/duplo_resource/s3.py Outdated
Copy link
Copy Markdown
Collaborator

@kferrone kferrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea, instead of making a call to the list to guarantee some name exists. There is some way we can determine if something is using a prefixed name or not.

Like we could always try twice, first find takes the name as is, if we get a not found then we try one more time with the prefix. I have done this logic before and it worked well.

@amaechiabuah
Copy link
Copy Markdown
Collaborator Author

I have an idea, instead of making a call to the list to guarantee some name exists. There is some way we can determine if something is using a prefixed name or not.

Like we could always try twice, first find takes the name as is, if we get a not found then we try one more time with the prefix. I have done this logic before and it worked well.

I see what you mean. The problem is that prefixed_name alone isn't enough to reconstruct the full bucket name. AWS buckets come back as duploservices--- (e.g. duploservices-amaechi-test-bucket-182680712604). So trying the short name, and then trying the prefix would still miss because we don't know the account-ID suffix without listing.

Maybe we could first attempt GET {name} as-is for the case where the user passes in the full name, and only fall back to list-and-match on DuploNotFound (handles short names). That way the direct endpoint is still exercised whenever the caller already knows the full name, and we only pay the list cost when resolving a short name.

Copy link
Copy Markdown
Collaborator

@kferrone kferrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the System resource to get the Account id.

Something like this in the constructor:

sys_svc = self.duplo.load("system")

Then when you need to:

sys_info = sys_svc.info()

Then you can find the account id on sys_info.

Copy link
Copy Markdown
Collaborator

@kferrone kferrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the logic would be like this:

  1. try to find with the name given as is.
  2. if the name is found as is, then return that
  3. if the name given was not found, then build the generated "fullname" with account id and tenant prefix, try and find that
  4. if long name found return it
  5. if not even the long name is found, then truly raise a DuploNotFound error

@amaechiabuah amaechiabuah requested a review from kferrone April 20, 2026 18:28
kferrone and others added 2 commits April 24, 2026 15:58
Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Collaborator

@kferrone kferrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kferrone kferrone merged commit 61a033a into main Apr 24, 2026
7 checks passed
@kferrone kferrone deleted the fix/s3-update branch April 24, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants