fix: TorridgeDistrictCouncil - parse new relative-date SOAP format by InertiaUK · Pull Request #1932 · robbrad/UKBinCollectionData

InertiaUK · 2026-04-08T16:25:20Z

What this changes

Torridge's SOAP endpoint (getRoundCalendarForUPRN) now returns a completely different payload. Where it used to return explicit dates like Mon 14 Apr, it now returns relative phrases plus an embedded calendar table:

<b>Refuse</b>: Tomorrow then every Mon<br>
<b>Recycling</b>: No Recycling waste collection for this address<br>
...
<table id="CalTab00" class="newCalendarTable">...</table>

The old regex ([A-Za-z]+ \d\d? [A-Za-z]+) returns nothing against "Tomorrow then every Mon", so parse_data silently produced an empty bins list.

What the new parser does

Recognises relative phrases: Today, Tomorrow, and <Weekday> (e.g. every Mon -> next Monday).
Still handles the old explicit Mon 14 Apr format as a fallback, in case Torridge ever reverts.
Gracefully skips lines like No Recycling waste collection for this address (some UPRNs only have a subset of services).
Keeps the SOAPAction + charset header from the earlier iteration of this PR.

Testing

Verified via the VPS wrapper against UPRN 10091078762:

{
  "bins": [
    { "type": "Refuse", "collectionDate": "13/04/2026" }
  ]
}

13/04/2026 is the Monday following today (12/04/2026), which matches the server's textual Tomorrow then every Mon hint.

Supersedes

This replaces the previous commit on this branch (header + regex tidy). Force-pushed.

Summary by CodeRabbit

Bug Fixes
- Improved reliability of bin collection data parsing for Torridge District Council.
- Enhanced handling of collection dates, including support for relative dates (today, tomorrow) and weekday names.
- Strengthened input validation for required parameters.

coderabbitai · 2026-04-08T16:25:29Z

📝 Walkthrough

Walkthrough

Updated Torridge District Council's bin collection parser to improve SOAP request handling by adding SOAPAction header, tightened UPRN validation, and refactored bin parsing from hardcoded bin types to generic extraction from bold HTML elements with advanced filtering and date normalization logic.

Changes

Cohort / File(s)	Summary
SOAP Request/Response & Bin Parsing Refactor `uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py`	Enhanced SOAP request handling with SOAPAction header; stricter UPRN validation via explicit ValueError; replaced conditional bin-type branches with generic loop over bold elements; added filtering logic to skip year-like patterns and invalid entries; implemented new `_extract_base_date()` method supporting relative dates (today/tomorrow), explicit day/month/year patterns, and weekday phrases with fallback.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Bold elements now dance in a circular way,
Instead of hardcoded bins from yesterday,
SOAP headers fly and dates normalize true,
Generic parsing brings a fresh morning dew,
Tomorrow and today both find their place anew! 📅✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change: fixing Torridge's SOAP parser to handle the new relative-date format, with SOAPAction header support.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Torridge's SOAP API changed its response from explicit dates ("Mon 14 Apr") to relative phrases ("Tomorrow then every Mon", "Today then every Tue") with an embedded calendar table. The old regex-based parser returned empty because it expected the old format. The rewritten parser handles Today/Tomorrow/weekday-name phrases, falls back to the old explicit format if it reappears, and gracefully skips "No X collection for this address" entries.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py`:
- Around line 62-64: The code currently sets inner_html = result.text if result
is not None else "" which hides missing SOAP nodes; change this to fail fast by
checking the SOAP response node (result) and its text before creating the
BeautifulSoup. Specifically, in the parsing routine where variables result and
inner_html are used (the block that assigns inner_html and then calls
BeautifulSoup), raise an explicit exception (e.g., ValueError or a custom
ParseError) when result is None or result.text is empty/None with a clear
message indicating the missing SOAP result for TorridgeDistrictCouncil, instead
of defaulting to an empty string; this will surface upstream format changes
immediately.
- Around line 55-56: The code parses untrusted SOAP XML using
xml.etree.ElementTree.fromstring (vulnerable to XXE/billion-laughs); add
defusedxml to project dependencies (pyproject.toml) and replace imports/usages
of ElementTree in TorridgeDistrictCouncil.py to use defusedxml.ElementTree
(e.g., update the import and keep calling
defusedxml.ElementTree.fromstring(dom_text) and dom.find(...) as before) so
parsing uses the safe defusedxml implementation.
- Around line 49-50: The POST call at resp = requests.post(url, headers=headers,
data=post_data, verify=False) should include a timeout and explicit HTTP error
handling: add a timeout argument (e.g. timeout=10) to the requests.post call,
immediately call resp.raise_for_status() after the response to turn non-2xx
status codes into exceptions, and wrap the request/raise in a try/except that
catches requests.exceptions.RequestException to log or propagate a clear error
(including status/code/message) instead of allowing downstream XML parsing to
fail; update the function containing this call (the method with resp =
requests.post) to handle the exception path cleanly (log and return/raise a
descriptive error).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ffbf182-90ca-4973-955e-02c2c052c3ce

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and f1caa13.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py

coderabbitai · 2026-04-12T10:38:49Z

uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py

+        resp = requests.post(url, headers=headers, data=post_data, verify=False)



⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify SOAP calls in council parsers and whether timeout / raise_for_status are present rg -n --type=py -C2 'requests\.post\(' uk_bin_collection/uk_bin_collection/councils rg -n --type=py -C2 'raise_for_status\(' uk_bin_collection/uk_bin_collection/councils

Repository: robbrad/UKBinCollectionData

Length of output: 50383

🏁 Script executed:

cat -n uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py | sed -n '40,70p'

Repository: robbrad/UKBinCollectionData

Length of output: 1554

Add timeout and HTTP error handling to the SOAP request.

Line 49 performs a network call without timeout and without raise_for_status(). This can hang indefinitely and fail with confusing XML parsing errors instead of a clear HTTP error.

Suggested patch

- resp = requests.post(url, headers=headers, data=post_data, verify=False) + try: + resp = requests.post( + url, + headers=headers, + data=post_data, + verify=False, + timeout=30, + ) + resp.raise_for_status() + except requests.RequestException as ex: + raise ValueError("Failed to retrieve bin data from Torridge SOAP endpoint") from ex

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

resp = requests.post(url, headers=headers, data=post_data, verify=False)

try:

resp = requests.post(

url,

headers=headers,

data=post_data,

verify=False,

timeout=30,

)

resp.raise_for_status()

except requests.RequestException as ex:

raise ValueError("Failed to retrieve bin data from Torridge SOAP endpoint") from ex

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py` around lines 49 - 50, The POST call at resp = requests.post(url, headers=headers, data=post_data, verify=False) should include a timeout and explicit HTTP error handling: add a timeout argument (e.g. timeout=10) to the requests.post call, immediately call resp.raise_for_status() after the response to turn non-2xx status codes into exceptions, and wrap the request/raise in a try/except that catches requests.exceptions.RequestException to log or propagate a clear error (including status/code/message) instead of allowing downstream XML parsing to fail; update the function containing this call (the method with resp = requests.post) to handle the exception path cleanly (log and return/raise a descriptive error).

coderabbitai · 2026-04-12T10:38:49Z

uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py

+        dom = ElementTree.fromstring(resp.text)
+        result = dom.find(


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify current XML parser usage and whether defusedxml is already used rg -n --type=py -C2 'from xml\.etree import ElementTree|ElementTree\.fromstring\(' rg -n --type=py -C2 'defusedxml' fd -i 'pyproject.toml|requirements*.txt|setup.py|Pipfile|poetry.lock'

Repository: robbrad/UKBinCollectionData

Length of output: 1919

🏁 Script executed:

cat pyproject.toml

Repository: robbrad/UKBinCollectionData

Length of output: 2145

🏁 Script executed:

# Check for defusedxml in poetry.lock rg -n 'defusedxml' poetry.lock | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 671

🏁 Script executed:

rg -n 'ElementTree\.fromstring' --type=py

Repository: robbrad/UKBinCollectionData

Length of output: 320

Add defusedxml dependency and use it for parsing remote SOAP XML to prevent XML attacks.

Line 55 parses untrusted remote SOAP XML with xml.etree.ElementTree.fromstring, which is vulnerable to XML attacks (XXE, billion laughs). Replace with defusedxml.ElementTree. First, add defusedxml to pyproject.toml dependencies, then update the import:

Suggested patch

-from xml.etree import ElementTree +from defusedxml import ElementTree

🧰 Tools

🪛 Ruff (0.15.9)

[error] 55-55: Using xml to parse untrusted data is known to be vulnerable to XML attacks; use defusedxml equivalents

(S314)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py` around lines 55 - 56, The code parses untrusted SOAP XML using xml.etree.ElementTree.fromstring (vulnerable to XXE/billion-laughs); add defusedxml to project dependencies (pyproject.toml) and replace imports/usages of ElementTree in TorridgeDistrictCouncil.py to use defusedxml.ElementTree (e.g., update the import and keep calling defusedxml.ElementTree.fromstring(dom_text) and dom.find(...) as before) so parsing uses the safe defusedxml implementation.

coderabbitai · 2026-04-12T10:38:49Z

uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py

+        inner_html = result.text if result is not None else ""

+        soup = BeautifulSoup(inner_html, features="html.parser")


⚠️ Potential issue | 🟠 Major

Fail fast if the SOAP result node is missing or empty.

Line 62 silently defaults to "", which can mask upstream format changes as “no bins”. Raise an explicit exception when result/result.text is missing.

Suggested patch

- inner_html = result.text if result is not None else "" + if result is None or not result.text: + raise ValueError( + "Torridge SOAP response missing getRoundCalendarForUPRNResult content" + ) + inner_html = result.text

Based on learnings: In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures over silent defaults or swallowed errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

inner_html = result.text if result is not None else ""

soup = BeautifulSoup(inner_html, features="html.parser")

if result is None or not result.text:

raise ValueError(

"Torridge SOAP response missing getRoundCalendarForUPRNResult content"

)

inner_html = result.text

soup = BeautifulSoup(inner_html, features="html.parser")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/TorridgeDistrictCouncil.py` around lines 62 - 64, The code currently sets inner_html = result.text if result is not None else "" which hides missing SOAP nodes; change this to fail fast by checking the SOAP response node (result) and its text before creating the BeautifulSoup. Specifically, in the parsing routine where variables result and inner_html are used (the block that assigns inner_html and then calls BeautifulSoup), raise an explicit exception (e.g., ValueError or a custom ParseError) when result is None or result.text is empty/None with a clear message indicating the missing SOAP result for TorridgeDistrictCouncil, instead of defaulting to an empty string; this will surface upstream format changes immediately.

InertiaUK force-pushed the fix/TorridgeDistrictCouncil branch from b5ca9e1 to f1caa13 Compare April 12, 2026 10:35

InertiaUK changed the title ~~fix: TorridgeDistrictCouncil - add SOAPAction header and tighten date regex~~ fix: TorridgeDistrictCouncil - parse new relative-date SOAP format Apr 12, 2026

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: TorridgeDistrictCouncil - parse new relative-date SOAP format#1932

fix: TorridgeDistrictCouncil - parse new relative-date SOAP format#1932
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/TorridgeDistrictCouncil

InertiaUK commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		resp = requests.post(url, headers=headers, data=post_data, verify=False)

		inner_html = result.text if result is not None else ""

		soup = BeautifulSoup(inner_html, features="html.parser")

Conversation

InertiaUK commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this changes

What the new parser does

Testing

Supersedes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading