Skip to content

1.9.x to main#143

Merged
vjbytes102 merged 9 commits into
mainfrom
1.9.x
Jun 14, 2026
Merged

1.9.x to main#143
vjbytes102 merged 9 commits into
mainfrom
1.9.x

Conversation

@dishantsethi

@dishantsethi dishantsethi commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Description

[Describe the purpose and scope of this pull request.]

Changes Made

[List the changes made in this pull request.]

Checklist

  • followed the coding style guidelines.
  • tested all changes.
  • updated the documentation to reflect these changes.
  • assigned reviewers to this pull request.

Summary by CodeRabbit

  • New Features

    • Metadata upload now creates server-side CSV archives (uploads via presigned URL) and finalizes them with success/failure counts.
  • Refactor

    • Language support consolidated into a single supported list; language choices are now derived from it (codes/order may differ).
  • Behavior

    • Language validation relaxed: rows without a language value are no longer rejected.
  • Chores

    • API host construction improved for environment-based URLs.
    • Package version bumped to 1.9.7

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds SUPPORTED_LANGUAGES and derives LANGUAGE_CHOICES from it; implements archive_metadata_csv and finalize_metadata_csv and wires them into metadata upload flows; refactors WillisapiClient host construction with get_api_host(); bumps version to 1.9.7.

Changes

Metadata archiving, language choices, and client URL updates

Layer / File(s) Summary
Language choices table consolidation
willisapi_client/services/metadata/language_choices.py
SUPPORTED_LANGUAGES centralizes (code, display_name) pairs and LANGUAGE_CHOICES is now computed from it.
Metadata archive API
willisapi_client/services/metadata/archive.py
Adds _archive_headers, archive_metadata_csv to POST init and upload CSV to presigned URL, and finalize_metadata_csv to PATCH final status and counts.
Upload flows integrate archiving
willisapi_client/services/metadata/upload.py
Imports and calls archive_metadata_csv before uploads and finalize_metadata_csv after processing in both upload and processed_upload.
API host and URL builder refactor
willisapi_client/willisapi_client.py
Adds get_api_host(), defaults api_uri to api.willis.health, and routes get_base_url()/get_base_v2_url() through the new host logic.
Version bump
willisapi_client/__version__.py
Update __latestVersion__ from "1.9.6" to "1.9.7".

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant WillisAPI
  participant S3
  Client->>WillisAPI: POST /metadata/archive (init payload)
  WillisAPI-->>Client: 200 { record_id, presigned_url }
  Client->>S3: PUT presigned_url (upload CSV)
  S3-->>Client: 200 OK
  Client->>WillisAPI: PATCH /metadata/archive/{record_id} (status, counts)
  WillisAPI-->>Client: 200 OK
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • vjbytes102

Poem

A CSV sails out to cloud and back,
Languages gathered in one tidy stack,
Hosts resolve by env with a brand-new knack,
Version nudged forward — neat little hack,
Archive and finalize — applause, unpack! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description contains only placeholder template text with all sections unfilled and no actual content describing the changes, scope, or checklist verification. Fill in all template sections: provide a clear purpose/scope description, list the specific changes made (language refactoring, metadata archiving, API host changes, version bump), and check off completed items.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title '1.9.x to main' is vague and does not clearly summarize the actual changes made in the pull request. Use a descriptive title that highlights the main changes, such as 'Refactor language choices and add metadata archiving support' or a similar summary of key modifications.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 1.9.x

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
willisapi_client/services/metadata/language_choices.py (1)

119-119: ⚡ Quick win

Use a set for validator membership checks.

Line 119 builds a list, while UploadUtils.validate_row and validate_processed_data_row repeatedly perform membership checks. Exposing a set reduces per-row lookup cost on large uploads without changing existing API.

♻️ Proposed refactor
 LANGUAGE_CHOICES = [code for code, _ in SUPPORTED_LANGUAGES]
+LANGUAGE_CHOICES_SET = set(LANGUAGE_CHOICES)
-from .language_choices import (
-    LANGUAGE_CHOICES,
-)
+from .language_choices import (
+    LANGUAGE_CHOICES,
+    LANGUAGE_CHOICES_SET,
+)

-        if self.row.language not in LANGUAGE_CHOICES:
+        if self.row.language not in LANGUAGE_CHOICES_SET:
             return (False, f"Invalid language: {self.row.language}")

-        if self.row.language not in LANGUAGE_CHOICES:
+        if self.row.language not in LANGUAGE_CHOICES_SET:
             return (False, f"Invalid language: {self.row.language}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/services/metadata/language_choices.py` at line 119,
LANGUAGE_CHOICES is currently a list comprehension which causes O(n) membership
checks in UploadUtils.validate_row and validate_processed_data_row; change
LANGUAGE_CHOICES to a set built from SUPPORTED_LANGUAGES (e.g., set(code for
code, _ in SUPPORTED_LANGUAGES) or {code for code, _ in SUPPORTED_LANGUAGES}) so
membership tests against LANGUAGE_CHOICES in UploadUtils.validate_row and
validate_processed_data_row become O(1) without changing the public API.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@willisapi_client/willisapi_client.py`:
- Around line 20-23: The get_api_host method currently interpolates self.env
directly into the hostname; validate and sanitize self.env before using it: in
get_api_host (and where self.env is set if applicable) ensure self.env matches a
DNS label pattern (e.g. only [A-Za-z0-9-], optionally lowercased) and reject or
normalize invalid characters (replace disallowed chars with '-' or raise
ValueError); if validation fails, fall back to returning self.api_uri instead of
constructing "api.{env}.willis.health". Ensure references to self.env and the
constructed host string are updated accordingly.

---

Nitpick comments:
In `@willisapi_client/services/metadata/language_choices.py`:
- Line 119: LANGUAGE_CHOICES is currently a list comprehension which causes O(n)
membership checks in UploadUtils.validate_row and validate_processed_data_row;
change LANGUAGE_CHOICES to a set built from SUPPORTED_LANGUAGES (e.g., set(code
for code, _ in SUPPORTED_LANGUAGES) or {code for code, _ in
SUPPORTED_LANGUAGES}) so membership tests against LANGUAGE_CHOICES in
UploadUtils.validate_row and validate_processed_data_row become O(1) without
changing the public API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53d1a203-4364-4f93-ac19-286eb3ab7438

📥 Commits

Reviewing files that changed from the base of the PR and between 3740f56 and 8c46dde.

📒 Files selected for processing (2)
  • willisapi_client/services/metadata/language_choices.py
  • willisapi_client/willisapi_client.py

Comment on lines +20 to +23
def get_api_host(self):
if self.env:
return f"https://{self.env}-{self.api_uri}/v{self.api_version}/"
return f"https://{self.api_uri}/v{self.api_version}/"
return f"api.{self.env}.willis.health"
return self.api_uri

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate env before interpolating it into the host.

At Line 21–Line 22, env is inserted directly into the hostname. If env contains invalid label characters (for example, whitespace or /), URL construction becomes invalid and downstream API calls can fail at runtime.

Suggested fix
+import re
 import setuptools
 import math
@@
     def get_api_host(self):
         if self.env:
-            return f"api.{self.env}.willis.health"
+            env = self.env.strip().lower()
+            if not re.fullmatch(r"[a-z0-9-]+", env):
+                raise ValueError("Invalid env; expected alphanumeric/hyphen label.")
+            return f"api.{env}.willis.health"
         return self.api_uri
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/willisapi_client.py` around lines 20 - 23, The get_api_host
method currently interpolates self.env directly into the hostname; validate and
sanitize self.env before using it: in get_api_host (and where self.env is set if
applicable) ensure self.env matches a DNS label pattern (e.g. only [A-Za-z0-9-],
optionally lowercased) and reject or normalize invalid characters (replace
disallowed chars with '-' or raise ValueError); if validation fails, fall back
to returning self.api_uri instead of constructing "api.{env}.willis.health".
Ensure references to self.env and the constructed host string are updated
accordingly.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
willisapi_client/services/metadata/upload.py (2)

154-161: 💤 Low value

Consider removing redundant int() cast.

Same as the earlier archive call: csv.transformed_df.shape[0] already returns an integer.

🔧 Optional simplification
         archive_record_id = archive_metadata_csv(
             api_key,
             csv_path,
-            int(csv.transformed_df.shape[0]),
+            csv.transformed_df.shape[0],
             upload_type="processed_data",
             env=kwargs.get("env"),
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/services/metadata/upload.py` around lines 154 - 161, Remove
the redundant int() cast around csv.transformed_df.shape[0] when calling
archive_metadata_csv; update the call in upload.py (the archive_metadata_csv
invocation that assigns archive_record_id) to pass csv.transformed_df.shape[0]
directly as the row count argument so you rely on the existing integer value
instead of wrapping it with int().

45-52: 💤 Low value

Consider removing redundant int() cast.

csv.transformed_df.shape[0] already returns an integer, so wrapping it in int() is unnecessary. However, this is harmless and might be defensive coding.

🔧 Optional simplification
         archive_record_id = archive_metadata_csv(
             api_key,
             csv_path,
-            int(csv.transformed_df.shape[0]),
+            csv.transformed_df.shape[0],
             upload_type="data",
             env=kwargs.get("env"),
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/services/metadata/upload.py` around lines 45 - 52, Remove
the redundant int() cast around csv.transformed_df.shape[0] in the call to
archive_metadata_csv: pass csv.transformed_df.shape[0] directly as the row count
argument to archive_metadata_csv (the call that assigns archive_record_id)
instead of int(csv.transformed_df.shape[0]), since shape[0] already yields an
int.
willisapi_client/services/metadata/archive.py (1)

41-53: 💤 Low value

Verify record_id before using it in finalize call.

If body.get("record_id") returns None, the finalize call on line 53 receives None. While finalize_metadata_csv guards against this on line 66, the intent would be clearer if the check happened here before calling finalize.

🔍 Proposed improvement for clarity
         body = res.json()
         record_id = body.get("record_id")
         presigned = body.get("presigned_url")
-        if not presigned:
+        if not record_id or not presigned:
-            logger.warning("CSV archive: no presigned URL returned")
+            logger.warning("CSV archive: missing record_id or presigned URL")
             return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/services/metadata/archive.py` around lines 41 - 53, The code
extracts record_id = body.get("record_id") but may call
finalize_metadata_csv(api_key, record_id, ...) with record_id == None; add an
early check after retrieving record_id (before the S3 upload and before calling
finalize_metadata_csv) to verify record_id is present and return or handle the
error if missing; update the branch that logs "CSV archive: no presigned URL
returned" and the failure branch that currently calls finalize_metadata_csv to
first validate record_id and avoid passing None into finalize_metadata_csv
(reference symbols: record_id, body.get, presigned, finalize_metadata_csv,
csv_path).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@willisapi_client/services/metadata/archive.py`:
- Around line 48-50: The requests.put call that uploads to S3 (the line creating
put_res using presigned and data=f) lacks a timeout and can hang indefinitely;
update that call to include a sensible timeout argument (e.g., timeout=30) so
the upload fails fast, and ensure any surrounding error handling (where put_res
is inspected) still works with requests raising a Timeout/RequestException.
- Around line 33-35: The requests.post call that creates `res` in
willisapi_client/services/metadata/archive.py must include a timeout to avoid
indefinite hangs; update the `requests.post(wc.get_csv_archive_url(),
headers=_archive_headers(api_key), json=payload)` call to pass a timeout (e.g.,
`timeout=30`) or reference a new module-level constant `ARCHIVE_TIMEOUT`, and
ensure the surrounding code (where `res` is used) continues to handle
failures—optionally catch `requests.exceptions.Timeout` to log/raise a clear
error.
- Around line 75-79: The requests.patch call that finalizes CSV archive (the
line assigning res = requests.patch(...)) is missing a timeout; update that call
to include a timeout parameter (e.g., timeout=10 or a module-level constant like
ARCHIVE_REQUEST_TIMEOUT) so the finalize request cannot hang indefinitely, and
ensure any existing upstream configuration option is used if available; modify
or add the constant in this module and use it in the requests.patch(...)
invocation that calls wc.get_csv_archive_finalize_url(record_id) with headers
from _archive_headers(api_key).

In `@willisapi_client/services/metadata/upload.py`:
- Around line 112-120: The finalize_metadata_csv calls in upload() and
processed_upload() incorrectly pass a hardcoded "successful" status; compute the
status dynamically from the results counts (use successful_rows and total/failed
counts) and pass that computed string (e.g., "successful" only if failed_count
== 0, otherwise "partial" or "failed" as your domain requires) into
finalize_metadata_csv instead of the literal "successful"; update the calls in
upload() and processed_upload() to derive status from successful_rows and
len(results) - successful_rows before invoking finalize_metadata_csv.

---

Nitpick comments:
In `@willisapi_client/services/metadata/archive.py`:
- Around line 41-53: The code extracts record_id = body.get("record_id") but may
call finalize_metadata_csv(api_key, record_id, ...) with record_id == None; add
an early check after retrieving record_id (before the S3 upload and before
calling finalize_metadata_csv) to verify record_id is present and return or
handle the error if missing; update the branch that logs "CSV archive: no
presigned URL returned" and the failure branch that currently calls
finalize_metadata_csv to first validate record_id and avoid passing None into
finalize_metadata_csv (reference symbols: record_id, body.get, presigned,
finalize_metadata_csv, csv_path).

In `@willisapi_client/services/metadata/upload.py`:
- Around line 154-161: Remove the redundant int() cast around
csv.transformed_df.shape[0] when calling archive_metadata_csv; update the call
in upload.py (the archive_metadata_csv invocation that assigns
archive_record_id) to pass csv.transformed_df.shape[0] directly as the row count
argument so you rely on the existing integer value instead of wrapping it with
int().
- Around line 45-52: Remove the redundant int() cast around
csv.transformed_df.shape[0] in the call to archive_metadata_csv: pass
csv.transformed_df.shape[0] directly as the row count argument to
archive_metadata_csv (the call that assigns archive_record_id) instead of
int(csv.transformed_df.shape[0]), since shape[0] already yields an int.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4b530e7e-c9a5-4458-9258-76ea7b6859dc

📥 Commits

Reviewing files that changed from the base of the PR and between 5a4525d and b0043ae.

📒 Files selected for processing (3)
  • willisapi_client/services/metadata/archive.py
  • willisapi_client/services/metadata/upload.py
  • willisapi_client/willisapi_client.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • willisapi_client/willisapi_client.py

Comment on lines +33 to +35
res = requests.post(
wc.get_csv_archive_url(), headers=_archive_headers(api_key), json=payload
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to prevent indefinite hangs.

The requests.post call lacks a timeout parameter. Without a timeout, this network call could hang indefinitely, violating the documented requirement that "archiving must never abort the actual data upload."

⏱️ Proposed fix to add timeout
         res = requests.post(
-            wc.get_csv_archive_url(), headers=_archive_headers(api_key), json=payload
+            wc.get_csv_archive_url(), 
+            headers=_archive_headers(api_key), 
+            json=payload,
+            timeout=30
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
res = requests.post(
wc.get_csv_archive_url(), headers=_archive_headers(api_key), json=payload
)
res = requests.post(
wc.get_csv_archive_url(),
headers=_archive_headers(api_key),
json=payload,
timeout=30
)
🧰 Tools
🪛 Ruff (0.15.15)

[error] 33-33: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/services/metadata/archive.py` around lines 33 - 35, The
requests.post call that creates `res` in
willisapi_client/services/metadata/archive.py must include a timeout to avoid
indefinite hangs; update the `requests.post(wc.get_csv_archive_url(),
headers=_archive_headers(api_key), json=payload)` call to pass a timeout (e.g.,
`timeout=30`) or reference a new module-level constant `ARCHIVE_TIMEOUT`, and
ensure the surrounding code (where `res` is used) continues to handle
failures—optionally catch `requests.exceptions.Timeout` to log/raise a clear
error.

Source: Linters/SAST tools

Comment on lines +48 to +50
put_res = requests.put(
presigned, data=f, headers={"Content-Type": "text/csv"}
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to prevent indefinite hangs.

The requests.put call lacks a timeout parameter. Without a timeout, the S3 upload could hang indefinitely, defeating the non-blocking nature of the archiving flow.

⏱️ Proposed fix to add timeout
             put_res = requests.put(
-                presigned, data=f, headers={"Content-Type": "text/csv"}
+                presigned, 
+                data=f, 
+                headers={"Content-Type": "text/csv"},
+                timeout=60
             )
🧰 Tools
🪛 Ruff (0.15.15)

[error] 48-48: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@willisapi_client/services/metadata/archive.py` around lines 48 - 50, The
requests.put call that uploads to S3 (the line creating put_res using presigned
and data=f) lacks a timeout and can hang indefinitely; update that call to
include a sensible timeout argument (e.g., timeout=30) so the upload fails fast,
and ensure any surrounding error handling (where put_res is inspected) still
works with requests raising a Timeout/RequestException.

Source: Linters/SAST tools

Comment thread willisapi_client/services/metadata/archive.py
Comment thread willisapi_client/services/metadata/upload.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@willisapi_client/services/metadata/utils.py`:
- Around line 424-426: Validation currently allows a missing language value but
generate_payload and generate_processed_payload access self.row.language
directly and can crash; update the validation checks (the blocks using
getattr(self.row, "language", None) around the language check) to treat a
missing language as invalid (e.g., if not language: return (False, "Missing
language")) so rows without language fail validation, and also make
generate_payload and generate_processed_payload resilient by reading language
via getattr(self.row, "language", None) (or providing a safe default) instead of
directly using self.row.language to prevent AttributeError if a missing language
slips through.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 037e397b-ed15-4e42-9674-5a82ea286d70

📥 Commits

Reviewing files that changed from the base of the PR and between b0043ae and 40a6fe9.

📒 Files selected for processing (1)
  • willisapi_client/services/metadata/utils.py

Comment thread willisapi_client/services/metadata/utils.py
@vjbytes102 vjbytes102 merged commit 9df55f1 into main Jun 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants