Skip to content

fix: LondonBoroughOfRichmondUponThames - rewrite for My Richmond page migration#1948

Open
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/LondonBoroughOfRichmondUponThames
Open

fix: LondonBoroughOfRichmondUponThames - rewrite for My Richmond page migration#1948
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/LondonBoroughOfRichmondUponThames

Conversation

@InertiaUK
Copy link
Copy Markdown

@InertiaUK InertiaUK commented Apr 12, 2026

Richmond moved their waste collection lookup from the old /services/waste_and_recycling/collection_days/ page to the My Richmond property portal at /my_richmond?pid=UPRN.

The old scraper looked for <a id="my_waste"> anchors between section markers. The new page uses <div class="my-item my-waste"> with <h4> bin type headings and <ul><li> collection dates.

Rewritten _extract_waste_block to match the new div.my-waste structure with a fallback to the old anchor format. Simplified the overall scraper — removed unused imports and consolidated helpers.

The PID (property ID) is a standard UPRN passed via the house number field. Tested with a real Richmond address.

Summary by CodeRabbit

  • Bug Fixes

    • Improved property identification validation to ensure accurate waste collection data retrieval
    • Enhanced compatibility with updated website structure for better data extraction
    • More consistent and reliable collection date parsing from website information
    • Stricter validation with clearer error messages for incomplete property data
  • Refactor

    • Optimized internal data parsing and processing workflows

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Updated the Richmond upon Thames council class to adapt HTML parsing for new waste section markup format, refined PID parameter handling to remove external passthrough while deriving from URL query or PAON data, and simplified User-Agent header construction.

Changes

Cohort / File(s) Summary
PID Parameter Handling
uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py
Removed pid parameter passthrough; now derives PID exclusively from URL query or PAON, raising ValueError if neither source provides a value. Updated URL construction to append pid={pid} only when not already present in base URL.
HTML Parsing Adaptation
uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py
Reworked _extract_waste_block to first attempt extraction from <div> with my-waste class, falling back to legacy <a id="my_waste"> pattern. Refactored section slicing and <ul> content handling in _parse_html_for_waste.
Utility Function Updates
uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py
Replaced _first_date_or_message with _first_date to remove date-message fallback behavior. Simplified User-Agent header to single literal string.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐰 A rabbit hops through Richmond's waste,
HTML structures rearranged with haste,
Old patterns fade, new classes bloom,
PID parameters find their room,
Parsing refined—no message clutter,
Through twists and turns, code's all aquiver! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: rewriting the LondonBoroughOfRichmondUponThames scraper to handle the My Richmond page migration. It is specific, clear, and directly reflects the main objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py (1)

96-103: ⚠️ Potential issue | 🟠 Major

Remove unnecessary broad exception handler in _pid_from_url.

The try-except Exception block at line 102 is unnecessary—urlparse() and parse_qs() are designed to handle edge cases gracefully and do not raise exceptions. Removing this handler ensures that real errors are not silently hidden and makes the code's intent clearer.

Suggested fix
     def _pid_from_url(self, url):
         if not url:
             return None
-        try:
-            q = parse_qs(urlparse(url).query)
-            return q.get("pid", [None])[0]
-        except Exception:
-            return None
+        q = parse_qs(urlparse(url).query)
+        return q.get("pid", [None])[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
around lines 96 - 103, Remove the unnecessary broad try/except in the
_pid_from_url method: keep the initial guard (if not url: return None), call
urlparse(url).query and parse_qs(...) to extract "pid" (q.get("pid", [None])[0])
and return it directly without catching Exception; this avoids swallowing real
errors from _pid_from_url while preserving the same None behavior for
empty/absent pid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Line 13: The module docstring for LondonBoroughOfRichmondUponThames contains
an EN DASH (–) which triggers Ruff RUF002; open the module (symbol:
LondonBoroughOfRichmondUponThames or the top-level docstring) and replace the EN
DASH with a normal hyphen (-) in the string "Richmond upon Thames – parse My
Richmond property page." so it becomes "Richmond upon Thames - parse My Richmond
property page.".

---

Outside diff comments:
In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`:
- Around line 96-103: Remove the unnecessary broad try/except in the
_pid_from_url method: keep the initial guard (if not url: return None), call
urlparse(url).query and parse_qs(...) to extract "pid" (q.get("pid", [None])[0])
and return it directly without catching Exception; this avoids swallowing real
errors from _pid_from_url while preserving the same None behavior for
empty/absent pid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8545302-afec-459e-9764-ec6e39267b57

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and 798033e.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py

Richmond upon Thames – parse the static My Property page.
No Selenium. No BeautifulSoup. Just requests + regex tailored to the current markup.
"""
"""Richmond upon Thames – parse My Richmond property page."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace EN DASH in docstring to satisfy Ruff RUF002.

Line 13 uses ; switch to - to clear the lint warning.

💡 Suggested patch
-    """Richmond upon Thames – parse My Richmond property page."""
+    """Richmond upon Thames - parse My Richmond property page."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Richmond upon Thames parse My Richmond property page."""
"""Richmond upon Thames - parse My Richmond property page."""
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 13-13: Docstring contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@uk_bin_collection/uk_bin_collection/councils/LondonBoroughOfRichmondUponThames.py`
at line 13, The module docstring for LondonBoroughOfRichmondUponThames contains
an EN DASH (–) which triggers Ruff RUF002; open the module (symbol:
LondonBoroughOfRichmondUponThames or the top-level docstring) and replace the EN
DASH with a normal hyphen (-) in the string "Richmond upon Thames – parse My
Richmond property page." so it becomes "Richmond upon Thames - parse My Richmond
property page.".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant