Skip to content

fix: MorayCouncil - rewrite for annual calendar (7-day view removed)#1941

Open
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/MorayCouncil
Open

fix: MorayCouncil - rewrite for annual calendar (7-day view removed)#1941
InertiaUK wants to merge 1 commit intorobbrad:masterfrom
InertiaUK:fix/MorayCouncil

Conversation

@InertiaUK
Copy link
Copy Markdown

@InertiaUK InertiaUK commented Apr 10, 2026

Moray Council removed their 7-day schedule view and replaced it with an annual calendar page. The old scraper was looking for elements that no longer exist.

Rewrote the scraper to parse the annual calendar at /my-area/bins. Each collection date is encoded in CSS classes on the calendar day cells, with the bin type determined by the class name. Parses all future dates from the current month onwards.

Tested with an IV postcode in Moray.

Summary by CodeRabbit

  • Bug Fixes
    • Updated bin collection data source to fetch correct annual calendar
    • Improved bin type identification and parsing accuracy
    • Added HTTP response validation to handle errors gracefully
    • Filtered collection schedules to show only upcoming dates
    • Results now sorted chronologically by collection date for easier reference

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

The MorayCouncil scraper's data-fetching logic was refactored to use the annual calendar endpoint instead of the previous "disp_bins" endpoint. HTML parsing strategy changed from extracting bin types via image alt attributes to deriving them from CSS class characters. Added date filtering and sorting.

Changes

Cohort / File(s) Summary
Moray Council Parser Refactoring
uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py
Switched endpoint from "disp_bins" to annual calendar view, introduced BIN_TYPE_MAP for CSS-to-bin-type mapping, replaced alt attribute parsing with CSS class character parsing, added HTTP 200 status check with early exit, implemented date filtering (on/after today) and sorting by collection date.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • dp247

Poem

🐰 A calendar change, what a grand sight,
Old bins flew away through the digital night,
CSS classes now whisper their tale,
Dates filter and sort without fail,
Moray's collections are sorted just right! 🗑️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: rewriting MorayCouncil to parse an annual calendar instead of the removed 7-day view.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py`:
- Around line 49-92: The loop over month_container/day_div is swallowing
structural errors; update parsing in the loop that references month_container,
header.find("h2"), month_name, days_container, day_div, day_text, and
datetime(...) to raise descriptive exceptions instead of silently continuing
when the calendar markup is unexpectedly missing or malformed (e.g., missing
header/h2, unknown month_name not in month_names, missing days_container,
non-numeric day_text, or ValueError from datetime for invalid date); continue
only for known filler CSS classes like "blank" and "day-name", and include
context in the exception message (council name, month_name, raw HTML snippet or
class list, and offending value) so failures surface as a scraper break rather
than returning empty bindata["bins"] for functions/methods that use
self.BIN_TYPE_MAP, collection_date, today, and date_format.
- Around line 35-38: The Moray calendar fetch uses requests.get(url) without
timeout or exception handling in MorayCouncil (the response variable and bindata
return path); wrap the network call in a try/except that catches
requests.exceptions.RequestException, supply a reasonable timeout (e.g. 10s) to
requests.get, and on exception return bindata (and optionally log the error) so
the empty-result fallback is used consistently with other councils.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bec89e4a-b7d8-4651-b214-8c742be67de9

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and f2d204a.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py

Comment on lines 35 to +38
response = requests.get(url)
print(f"Response status code: {response.status_code}") # Debug

# if response.status_code != 200:
# fallback_url = "https://bindayfinder.moray.gov.uk/cal_2024_view.php"
# print(f"Falling back to: {fallback_url}") # Debug
# response = requests.get(
# fallback_url,
# params={"id": user_uprn},
# )
# print(f"Fallback response status: {response.status_code}") # Debug
if response.status_code != 200:
return bindata
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -type f -name "MorayCouncil.py" 2>/dev/null | head -5

Repository: robbrad/UKBinCollectionData

Length of output: 1906


🏁 Script executed:

# Search for the file in a broader context
find . -type f -name "*MorayCouncil*" 2>/dev/null

Repository: robbrad/UKBinCollectionData

Length of output: 1906


🏁 Script executed:

# List files to understand repo structure
git ls-files | grep -i moray

Repository: robbrad/UKBinCollectionData

Length of output: 131


🏁 Script executed:

git ls-files | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 784


🏁 Script executed:

cat -n uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py | head -60

Repository: robbrad/UKBinCollectionData

Length of output: 2440


🏁 Script executed:

cat -n uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py | tail -n +25 | head -n 40

Repository: robbrad/UKBinCollectionData

Length of output: 1767


🏁 Script executed:

# Check if there's any try-except or exception handling around this method call
rg -A 20 "def parse_data" uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 784


🏁 Script executed:

# Check parent class to see if there's exception handling at a higher level
cat -n uk_bin_collection/uk_bin_collection/get_bin_data.py | head -100

Repository: robbrad/UKBinCollectionData

Length of output: 3757


🏁 Script executed:

# Check more of the parent class to see if there's exception handling
cat -n uk_bin_collection/uk_bin_collection/get_bin_data.py | sed -n '74,150p'

Repository: robbrad/UKBinCollectionData

Length of output: 3106


🏁 Script executed:

# Check if other council implementations use timeout for requests.get
rg "requests\.get.*timeout" uk_bin_collection/uk_bin_collection/councils/ | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 2144


🏁 Script executed:

# Check if any council implementations handle RequestException
rg "RequestException|requests\.exceptions" uk_bin_collection/uk_bin_collection/councils/ | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 2188


Add timeout and exception handling to the Moray calendar fetch.

requests.get(url) at line 35 lacks a timeout and does not catch RequestException. This can cause indefinite hangs and unhandled exceptions that bypass the empty-result fallback. The parent class and other council implementations consistently use timeout parameters and exception handling for network calls. Add both to maintain reliability and consistency.

Suggested fix
-        response = requests.get(url)
+        try:
+            response = requests.get(url, timeout=10)
+        except requests.RequestException:
+            return bindata
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = requests.get(url)
print(f"Response status code: {response.status_code}") # Debug
# if response.status_code != 200:
# fallback_url = "https://bindayfinder.moray.gov.uk/cal_2024_view.php"
# print(f"Falling back to: {fallback_url}") # Debug
# response = requests.get(
# fallback_url,
# params={"id": user_uprn},
# )
# print(f"Fallback response status: {response.status_code}") # Debug
if response.status_code != 200:
return bindata
try:
response = requests.get(url, timeout=10)
except requests.RequestException:
return bindata
if response.status_code != 200:
return bindata
🧰 Tools
🪛 Ruff (0.15.9)

[error] 35-35: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py` around lines 35
- 38, The Moray calendar fetch uses requests.get(url) without timeout or
exception handling in MorayCouncil (the response variable and bindata return
path); wrap the network call in a try/except that catches
requests.exceptions.RequestException, supply a reasonable timeout (e.g. 10s) to
requests.get, and on exception return bindata (and optionally log the error) so
the empty-result fallback is used consistently with other councils.

Comment on lines +49 to +92
for month_container in soup.find_all("div", class_="month-container"):
header = month_container.find("div", class_="month-header")
if not header or not header.find("h2"):
continue
month_name = header.find("h2").text.strip()
if month_name not in month_names:
continue
month_num = month_names.index(month_name) + 1

days_container = month_container.find("div", class_="days-container")
if not days_container:
continue

day_num = 0
for day_div in days_container.find_all("div"):
css_classes = day_div.get("class", [])

# Skip blank days
if "blank" in css_classes:
continue

# Get the day number from the text
day_text = day_div.text.strip()
if not day_text or not day_text.isdigit():
continue
day_num = int(day_text)

# Check if this day has bin collection classes
for css_class in css_classes:
if css_class in ("blank", "day-name", ""):
continue

print(f"Final bindata: {bindata}") # Debug
# Each character in the CSS class represents a bin type
for char in css_class:
if char in self.BIN_TYPE_MAP:
try:
collection_date = datetime(year, month_num, day_num).date()
if collection_date >= today:
bindata["bins"].append({
"type": self.BIN_TYPE_MAP[char],
"collectionDate": collection_date.strftime(date_format),
})
except ValueError:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when the calendar markup stops matching the expected shape.

This loop silently continues on missing headers, unknown month names, non-numeric day text, and invalid dates. That makes the next Moray markup change look like “no bins found” instead of a scraper breakage. Raise a descriptive exception once a month/day cell is structurally wrong, and only skip known filler cells.

Suggested fix
-            header = month_container.find("div", class_="month-header")
-            if not header or not header.find("h2"):
-                continue
+            header = month_container.find("div", class_="month-header")
+            if header is None or header.find("h2") is None:
+                raise ValueError("Moray calendar markup changed: missing month header")

             month_name = header.find("h2").text.strip()
-            if month_name not in month_names:
-                continue
             month_num = month_names.index(month_name) + 1

             days_container = month_container.find("div", class_="days-container")
-            if not days_container:
-                continue
+            if days_container is None:
+                raise ValueError(f"Moray calendar markup changed: missing days container for {month_name}")

             day_num = 0
             for day_div in days_container.find_all("div"):
                 css_classes = day_div.get("class", [])

                 if "blank" in css_classes:
                     continue

                 day_text = day_div.text.strip()
-                if not day_text or not day_text.isdigit():
-                    continue
+                if not day_text.isdigit():
+                    raise ValueError(f"Unexpected Moray day cell text: {day_text!r}")
                 day_num = int(day_text)
...
-                            except ValueError:
-                                continue
+                            except ValueError as exc:
+                                raise ValueError(
+                                    f"Invalid Moray calendar date: {year}-{month_num}-{day_num}"
+                                ) from exc

Based on learnings, in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures on unexpected formats over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py` around lines 49
- 92, The loop over month_container/day_div is swallowing structural errors;
update parsing in the loop that references month_container, header.find("h2"),
month_name, days_container, day_div, day_text, and datetime(...) to raise
descriptive exceptions instead of silently continuing when the calendar markup
is unexpectedly missing or malformed (e.g., missing header/h2, unknown
month_name not in month_names, missing days_container, non-numeric day_text, or
ValueError from datetime for invalid date); continue only for known filler CSS
classes like "blank" and "day-name", and include context in the exception
message (council name, month_name, raw HTML snippet or class list, and offending
value) so failures surface as a scraper break rather than returning empty
bindata["bins"] for functions/methods that use self.BIN_TYPE_MAP,
collection_date, today, and date_format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant