1.9.x to main#143
Conversation
update baseurl + add new lang
📝 WalkthroughWalkthroughAdds SUPPORTED_LANGUAGES and derives LANGUAGE_CHOICES from it; implements archive_metadata_csv and finalize_metadata_csv and wires them into metadata upload flows; refactors WillisapiClient host construction with get_api_host(); bumps version to 1.9.7. ChangesMetadata archiving, language choices, and client URL updates
Sequence Diagram(s)sequenceDiagram
participant Client
participant WillisAPI
participant S3
Client->>WillisAPI: POST /metadata/archive (init payload)
WillisAPI-->>Client: 200 { record_id, presigned_url }
Client->>S3: PUT presigned_url (upload CSV)
S3-->>Client: 200 OK
Client->>WillisAPI: PATCH /metadata/archive/{record_id} (status, counts)
WillisAPI-->>Client: 200 OK
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Bump version to 1.9.7
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
willisapi_client/services/metadata/language_choices.py (1)
119-119: ⚡ Quick winUse a set for validator membership checks.
Line 119 builds a list, while
UploadUtils.validate_rowandvalidate_processed_data_rowrepeatedly perform membership checks. Exposing a set reduces per-row lookup cost on large uploads without changing existing API.♻️ Proposed refactor
LANGUAGE_CHOICES = [code for code, _ in SUPPORTED_LANGUAGES] +LANGUAGE_CHOICES_SET = set(LANGUAGE_CHOICES)-from .language_choices import ( - LANGUAGE_CHOICES, -) +from .language_choices import ( + LANGUAGE_CHOICES, + LANGUAGE_CHOICES_SET, +) - if self.row.language not in LANGUAGE_CHOICES: + if self.row.language not in LANGUAGE_CHOICES_SET: return (False, f"Invalid language: {self.row.language}") - if self.row.language not in LANGUAGE_CHOICES: + if self.row.language not in LANGUAGE_CHOICES_SET: return (False, f"Invalid language: {self.row.language}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@willisapi_client/services/metadata/language_choices.py` at line 119, LANGUAGE_CHOICES is currently a list comprehension which causes O(n) membership checks in UploadUtils.validate_row and validate_processed_data_row; change LANGUAGE_CHOICES to a set built from SUPPORTED_LANGUAGES (e.g., set(code for code, _ in SUPPORTED_LANGUAGES) or {code for code, _ in SUPPORTED_LANGUAGES}) so membership tests against LANGUAGE_CHOICES in UploadUtils.validate_row and validate_processed_data_row become O(1) without changing the public API.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@willisapi_client/willisapi_client.py`:
- Around line 20-23: The get_api_host method currently interpolates self.env
directly into the hostname; validate and sanitize self.env before using it: in
get_api_host (and where self.env is set if applicable) ensure self.env matches a
DNS label pattern (e.g. only [A-Za-z0-9-], optionally lowercased) and reject or
normalize invalid characters (replace disallowed chars with '-' or raise
ValueError); if validation fails, fall back to returning self.api_uri instead of
constructing "api.{env}.willis.health". Ensure references to self.env and the
constructed host string are updated accordingly.
---
Nitpick comments:
In `@willisapi_client/services/metadata/language_choices.py`:
- Line 119: LANGUAGE_CHOICES is currently a list comprehension which causes O(n)
membership checks in UploadUtils.validate_row and validate_processed_data_row;
change LANGUAGE_CHOICES to a set built from SUPPORTED_LANGUAGES (e.g., set(code
for code, _ in SUPPORTED_LANGUAGES) or {code for code, _ in
SUPPORTED_LANGUAGES}) so membership tests against LANGUAGE_CHOICES in
UploadUtils.validate_row and validate_processed_data_row become O(1) without
changing the public API.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 53d1a203-4364-4f93-ac19-286eb3ab7438
📒 Files selected for processing (2)
willisapi_client/services/metadata/language_choices.pywillisapi_client/willisapi_client.py
| def get_api_host(self): | ||
| if self.env: | ||
| return f"https://{self.env}-{self.api_uri}/v{self.api_version}/" | ||
| return f"https://{self.api_uri}/v{self.api_version}/" | ||
| return f"api.{self.env}.willis.health" | ||
| return self.api_uri |
There was a problem hiding this comment.
Validate env before interpolating it into the host.
At Line 21–Line 22, env is inserted directly into the hostname. If env contains invalid label characters (for example, whitespace or /), URL construction becomes invalid and downstream API calls can fail at runtime.
Suggested fix
+import re
import setuptools
import math
@@
def get_api_host(self):
if self.env:
- return f"api.{self.env}.willis.health"
+ env = self.env.strip().lower()
+ if not re.fullmatch(r"[a-z0-9-]+", env):
+ raise ValueError("Invalid env; expected alphanumeric/hyphen label.")
+ return f"api.{env}.willis.health"
return self.api_uri🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@willisapi_client/willisapi_client.py` around lines 20 - 23, The get_api_host
method currently interpolates self.env directly into the hostname; validate and
sanitize self.env before using it: in get_api_host (and where self.env is set if
applicable) ensure self.env matches a DNS label pattern (e.g. only [A-Za-z0-9-],
optionally lowercased) and reject or normalize invalid characters (replace
disallowed chars with '-' or raise ValueError); if validation fails, fall back
to returning self.api_uri instead of constructing "api.{env}.willis.health".
Ensure references to self.env and the constructed host string are updated
accordingly.
add support to upload csv
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
willisapi_client/services/metadata/upload.py (2)
154-161: 💤 Low valueConsider removing redundant
int()cast.Same as the earlier archive call:
csv.transformed_df.shape[0]already returns an integer.🔧 Optional simplification
archive_record_id = archive_metadata_csv( api_key, csv_path, - int(csv.transformed_df.shape[0]), + csv.transformed_df.shape[0], upload_type="processed_data", env=kwargs.get("env"), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@willisapi_client/services/metadata/upload.py` around lines 154 - 161, Remove the redundant int() cast around csv.transformed_df.shape[0] when calling archive_metadata_csv; update the call in upload.py (the archive_metadata_csv invocation that assigns archive_record_id) to pass csv.transformed_df.shape[0] directly as the row count argument so you rely on the existing integer value instead of wrapping it with int().
45-52: 💤 Low valueConsider removing redundant
int()cast.
csv.transformed_df.shape[0]already returns an integer, so wrapping it inint()is unnecessary. However, this is harmless and might be defensive coding.🔧 Optional simplification
archive_record_id = archive_metadata_csv( api_key, csv_path, - int(csv.transformed_df.shape[0]), + csv.transformed_df.shape[0], upload_type="data", env=kwargs.get("env"), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@willisapi_client/services/metadata/upload.py` around lines 45 - 52, Remove the redundant int() cast around csv.transformed_df.shape[0] in the call to archive_metadata_csv: pass csv.transformed_df.shape[0] directly as the row count argument to archive_metadata_csv (the call that assigns archive_record_id) instead of int(csv.transformed_df.shape[0]), since shape[0] already yields an int.willisapi_client/services/metadata/archive.py (1)
41-53: 💤 Low valueVerify
record_idbefore using it in finalize call.If
body.get("record_id")returnsNone, the finalize call on line 53 receivesNone. Whilefinalize_metadata_csvguards against this on line 66, the intent would be clearer if the check happened here before calling finalize.🔍 Proposed improvement for clarity
body = res.json() record_id = body.get("record_id") presigned = body.get("presigned_url") - if not presigned: + if not record_id or not presigned: - logger.warning("CSV archive: no presigned URL returned") + logger.warning("CSV archive: missing record_id or presigned URL") return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@willisapi_client/services/metadata/archive.py` around lines 41 - 53, The code extracts record_id = body.get("record_id") but may call finalize_metadata_csv(api_key, record_id, ...) with record_id == None; add an early check after retrieving record_id (before the S3 upload and before calling finalize_metadata_csv) to verify record_id is present and return or handle the error if missing; update the branch that logs "CSV archive: no presigned URL returned" and the failure branch that currently calls finalize_metadata_csv to first validate record_id and avoid passing None into finalize_metadata_csv (reference symbols: record_id, body.get, presigned, finalize_metadata_csv, csv_path).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@willisapi_client/services/metadata/archive.py`:
- Around line 48-50: The requests.put call that uploads to S3 (the line creating
put_res using presigned and data=f) lacks a timeout and can hang indefinitely;
update that call to include a sensible timeout argument (e.g., timeout=30) so
the upload fails fast, and ensure any surrounding error handling (where put_res
is inspected) still works with requests raising a Timeout/RequestException.
- Around line 33-35: The requests.post call that creates `res` in
willisapi_client/services/metadata/archive.py must include a timeout to avoid
indefinite hangs; update the `requests.post(wc.get_csv_archive_url(),
headers=_archive_headers(api_key), json=payload)` call to pass a timeout (e.g.,
`timeout=30`) or reference a new module-level constant `ARCHIVE_TIMEOUT`, and
ensure the surrounding code (where `res` is used) continues to handle
failures—optionally catch `requests.exceptions.Timeout` to log/raise a clear
error.
- Around line 75-79: The requests.patch call that finalizes CSV archive (the
line assigning res = requests.patch(...)) is missing a timeout; update that call
to include a timeout parameter (e.g., timeout=10 or a module-level constant like
ARCHIVE_REQUEST_TIMEOUT) so the finalize request cannot hang indefinitely, and
ensure any existing upstream configuration option is used if available; modify
or add the constant in this module and use it in the requests.patch(...)
invocation that calls wc.get_csv_archive_finalize_url(record_id) with headers
from _archive_headers(api_key).
In `@willisapi_client/services/metadata/upload.py`:
- Around line 112-120: The finalize_metadata_csv calls in upload() and
processed_upload() incorrectly pass a hardcoded "successful" status; compute the
status dynamically from the results counts (use successful_rows and total/failed
counts) and pass that computed string (e.g., "successful" only if failed_count
== 0, otherwise "partial" or "failed" as your domain requires) into
finalize_metadata_csv instead of the literal "successful"; update the calls in
upload() and processed_upload() to derive status from successful_rows and
len(results) - successful_rows before invoking finalize_metadata_csv.
---
Nitpick comments:
In `@willisapi_client/services/metadata/archive.py`:
- Around line 41-53: The code extracts record_id = body.get("record_id") but may
call finalize_metadata_csv(api_key, record_id, ...) with record_id == None; add
an early check after retrieving record_id (before the S3 upload and before
calling finalize_metadata_csv) to verify record_id is present and return or
handle the error if missing; update the branch that logs "CSV archive: no
presigned URL returned" and the failure branch that currently calls
finalize_metadata_csv to first validate record_id and avoid passing None into
finalize_metadata_csv (reference symbols: record_id, body.get, presigned,
finalize_metadata_csv, csv_path).
In `@willisapi_client/services/metadata/upload.py`:
- Around line 154-161: Remove the redundant int() cast around
csv.transformed_df.shape[0] when calling archive_metadata_csv; update the call
in upload.py (the archive_metadata_csv invocation that assigns
archive_record_id) to pass csv.transformed_df.shape[0] directly as the row count
argument so you rely on the existing integer value instead of wrapping it with
int().
- Around line 45-52: Remove the redundant int() cast around
csv.transformed_df.shape[0] in the call to archive_metadata_csv: pass
csv.transformed_df.shape[0] directly as the row count argument to
archive_metadata_csv (the call that assigns archive_record_id) instead of
int(csv.transformed_df.shape[0]), since shape[0] already yields an int.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4b530e7e-c9a5-4458-9258-76ea7b6859dc
📒 Files selected for processing (3)
willisapi_client/services/metadata/archive.pywillisapi_client/services/metadata/upload.pywillisapi_client/willisapi_client.py
🚧 Files skipped from review as they are similar to previous changes (1)
- willisapi_client/willisapi_client.py
| res = requests.post( | ||
| wc.get_csv_archive_url(), headers=_archive_headers(api_key), json=payload | ||
| ) |
There was a problem hiding this comment.
Add timeout to prevent indefinite hangs.
The requests.post call lacks a timeout parameter. Without a timeout, this network call could hang indefinitely, violating the documented requirement that "archiving must never abort the actual data upload."
⏱️ Proposed fix to add timeout
res = requests.post(
- wc.get_csv_archive_url(), headers=_archive_headers(api_key), json=payload
+ wc.get_csv_archive_url(),
+ headers=_archive_headers(api_key),
+ json=payload,
+ timeout=30
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| res = requests.post( | |
| wc.get_csv_archive_url(), headers=_archive_headers(api_key), json=payload | |
| ) | |
| res = requests.post( | |
| wc.get_csv_archive_url(), | |
| headers=_archive_headers(api_key), | |
| json=payload, | |
| timeout=30 | |
| ) |
🧰 Tools
🪛 Ruff (0.15.15)
[error] 33-33: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@willisapi_client/services/metadata/archive.py` around lines 33 - 35, The
requests.post call that creates `res` in
willisapi_client/services/metadata/archive.py must include a timeout to avoid
indefinite hangs; update the `requests.post(wc.get_csv_archive_url(),
headers=_archive_headers(api_key), json=payload)` call to pass a timeout (e.g.,
`timeout=30`) or reference a new module-level constant `ARCHIVE_TIMEOUT`, and
ensure the surrounding code (where `res` is used) continues to handle
failures—optionally catch `requests.exceptions.Timeout` to log/raise a clear
error.
Source: Linters/SAST tools
| put_res = requests.put( | ||
| presigned, data=f, headers={"Content-Type": "text/csv"} | ||
| ) |
There was a problem hiding this comment.
Add timeout to prevent indefinite hangs.
The requests.put call lacks a timeout parameter. Without a timeout, the S3 upload could hang indefinitely, defeating the non-blocking nature of the archiving flow.
⏱️ Proposed fix to add timeout
put_res = requests.put(
- presigned, data=f, headers={"Content-Type": "text/csv"}
+ presigned,
+ data=f,
+ headers={"Content-Type": "text/csv"},
+ timeout=60
)🧰 Tools
🪛 Ruff (0.15.15)
[error] 48-48: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@willisapi_client/services/metadata/archive.py` around lines 48 - 50, The
requests.put call that uploads to S3 (the line creating put_res using presigned
and data=f) lacks a timeout and can hang indefinitely; update that call to
include a sensible timeout argument (e.g., timeout=30) so the upload fails fast,
and ensure any surrounding error handling (where put_res is inspected) still
works with requests raising a Timeout/RequestException.
Source: Linters/SAST tools
Update utils.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@willisapi_client/services/metadata/utils.py`:
- Around line 424-426: Validation currently allows a missing language value but
generate_payload and generate_processed_payload access self.row.language
directly and can crash; update the validation checks (the blocks using
getattr(self.row, "language", None) around the language check) to treat a
missing language as invalid (e.g., if not language: return (False, "Missing
language")) so rows without language fail validation, and also make
generate_payload and generate_processed_payload resilient by reading language
via getattr(self.row, "language", None) (or providing a safe default) instead of
directly using self.row.language to prevent AttributeError if a missing language
slips through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 037e397b-ed15-4e42-9674-5a82ea286d70
📒 Files selected for processing (1)
willisapi_client/services/metadata/utils.py
Description
[Describe the purpose and scope of this pull request.]
Changes Made
[List the changes made in this pull request.]
Checklist
Summary by CodeRabbit
New Features
Refactor
Behavior
Chores