fix: MorayCouncil - rewrite for annual calendar (7-day view removed)#1941
fix: MorayCouncil - rewrite for annual calendar (7-day view removed)#1941InertiaUK wants to merge 1 commit intorobbrad:masterfrom
Conversation
📝 WalkthroughWalkthroughThe MorayCouncil scraper's data-fetching logic was refactored to use the annual calendar endpoint instead of the previous "disp_bins" endpoint. HTML parsing strategy changed from extracting bin types via image Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py`:
- Around line 49-92: The loop over month_container/day_div is swallowing
structural errors; update parsing in the loop that references month_container,
header.find("h2"), month_name, days_container, day_div, day_text, and
datetime(...) to raise descriptive exceptions instead of silently continuing
when the calendar markup is unexpectedly missing or malformed (e.g., missing
header/h2, unknown month_name not in month_names, missing days_container,
non-numeric day_text, or ValueError from datetime for invalid date); continue
only for known filler CSS classes like "blank" and "day-name", and include
context in the exception message (council name, month_name, raw HTML snippet or
class list, and offending value) so failures surface as a scraper break rather
than returning empty bindata["bins"] for functions/methods that use
self.BIN_TYPE_MAP, collection_date, today, and date_format.
- Around line 35-38: The Moray calendar fetch uses requests.get(url) without
timeout or exception handling in MorayCouncil (the response variable and bindata
return path); wrap the network call in a try/except that catches
requests.exceptions.RequestException, supply a reasonable timeout (e.g. 10s) to
requests.get, and on exception return bindata (and optionally log the error) so
the empty-result fallback is used consistently with other councils.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bec89e4a-b7d8-4651-b214-8c742be67de9
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py
| response = requests.get(url) | ||
| print(f"Response status code: {response.status_code}") # Debug | ||
|
|
||
| # if response.status_code != 200: | ||
| # fallback_url = "https://bindayfinder.moray.gov.uk/cal_2024_view.php" | ||
| # print(f"Falling back to: {fallback_url}") # Debug | ||
| # response = requests.get( | ||
| # fallback_url, | ||
| # params={"id": user_uprn}, | ||
| # ) | ||
| # print(f"Fallback response status: {response.status_code}") # Debug | ||
| if response.status_code != 200: | ||
| return bindata |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -type f -name "MorayCouncil.py" 2>/dev/null | head -5Repository: robbrad/UKBinCollectionData
Length of output: 1906
🏁 Script executed:
# Search for the file in a broader context
find . -type f -name "*MorayCouncil*" 2>/dev/nullRepository: robbrad/UKBinCollectionData
Length of output: 1906
🏁 Script executed:
# List files to understand repo structure
git ls-files | grep -i morayRepository: robbrad/UKBinCollectionData
Length of output: 131
🏁 Script executed:
git ls-files | head -20Repository: robbrad/UKBinCollectionData
Length of output: 784
🏁 Script executed:
cat -n uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py | head -60Repository: robbrad/UKBinCollectionData
Length of output: 2440
🏁 Script executed:
cat -n uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py | tail -n +25 | head -n 40Repository: robbrad/UKBinCollectionData
Length of output: 1767
🏁 Script executed:
# Check if there's any try-except or exception handling around this method call
rg -A 20 "def parse_data" uk_bin_collection/uk_bin_collection/councils/MorayCouncil.pyRepository: robbrad/UKBinCollectionData
Length of output: 784
🏁 Script executed:
# Check parent class to see if there's exception handling at a higher level
cat -n uk_bin_collection/uk_bin_collection/get_bin_data.py | head -100Repository: robbrad/UKBinCollectionData
Length of output: 3757
🏁 Script executed:
# Check more of the parent class to see if there's exception handling
cat -n uk_bin_collection/uk_bin_collection/get_bin_data.py | sed -n '74,150p'Repository: robbrad/UKBinCollectionData
Length of output: 3106
🏁 Script executed:
# Check if other council implementations use timeout for requests.get
rg "requests\.get.*timeout" uk_bin_collection/uk_bin_collection/councils/ | head -20Repository: robbrad/UKBinCollectionData
Length of output: 2144
🏁 Script executed:
# Check if any council implementations handle RequestException
rg "RequestException|requests\.exceptions" uk_bin_collection/uk_bin_collection/councils/ | head -20Repository: robbrad/UKBinCollectionData
Length of output: 2188
Add timeout and exception handling to the Moray calendar fetch.
requests.get(url) at line 35 lacks a timeout and does not catch RequestException. This can cause indefinite hangs and unhandled exceptions that bypass the empty-result fallback. The parent class and other council implementations consistently use timeout parameters and exception handling for network calls. Add both to maintain reliability and consistency.
Suggested fix
- response = requests.get(url)
+ try:
+ response = requests.get(url, timeout=10)
+ except requests.RequestException:
+ return bindata📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = requests.get(url) | |
| print(f"Response status code: {response.status_code}") # Debug | |
| # if response.status_code != 200: | |
| # fallback_url = "https://bindayfinder.moray.gov.uk/cal_2024_view.php" | |
| # print(f"Falling back to: {fallback_url}") # Debug | |
| # response = requests.get( | |
| # fallback_url, | |
| # params={"id": user_uprn}, | |
| # ) | |
| # print(f"Fallback response status: {response.status_code}") # Debug | |
| if response.status_code != 200: | |
| return bindata | |
| try: | |
| response = requests.get(url, timeout=10) | |
| except requests.RequestException: | |
| return bindata | |
| if response.status_code != 200: | |
| return bindata |
🧰 Tools
🪛 Ruff (0.15.9)
[error] 35-35: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py` around lines 35
- 38, The Moray calendar fetch uses requests.get(url) without timeout or
exception handling in MorayCouncil (the response variable and bindata return
path); wrap the network call in a try/except that catches
requests.exceptions.RequestException, supply a reasonable timeout (e.g. 10s) to
requests.get, and on exception return bindata (and optionally log the error) so
the empty-result fallback is used consistently with other councils.
| for month_container in soup.find_all("div", class_="month-container"): | ||
| header = month_container.find("div", class_="month-header") | ||
| if not header or not header.find("h2"): | ||
| continue | ||
| month_name = header.find("h2").text.strip() | ||
| if month_name not in month_names: | ||
| continue | ||
| month_num = month_names.index(month_name) + 1 | ||
|
|
||
| days_container = month_container.find("div", class_="days-container") | ||
| if not days_container: | ||
| continue | ||
|
|
||
| day_num = 0 | ||
| for day_div in days_container.find_all("div"): | ||
| css_classes = day_div.get("class", []) | ||
|
|
||
| # Skip blank days | ||
| if "blank" in css_classes: | ||
| continue | ||
|
|
||
| # Get the day number from the text | ||
| day_text = day_div.text.strip() | ||
| if not day_text or not day_text.isdigit(): | ||
| continue | ||
| day_num = int(day_text) | ||
|
|
||
| # Check if this day has bin collection classes | ||
| for css_class in css_classes: | ||
| if css_class in ("blank", "day-name", ""): | ||
| continue | ||
|
|
||
| print(f"Final bindata: {bindata}") # Debug | ||
| # Each character in the CSS class represents a bin type | ||
| for char in css_class: | ||
| if char in self.BIN_TYPE_MAP: | ||
| try: | ||
| collection_date = datetime(year, month_num, day_num).date() | ||
| if collection_date >= today: | ||
| bindata["bins"].append({ | ||
| "type": self.BIN_TYPE_MAP[char], | ||
| "collectionDate": collection_date.strftime(date_format), | ||
| }) | ||
| except ValueError: | ||
| continue |
There was a problem hiding this comment.
Fail fast when the calendar markup stops matching the expected shape.
This loop silently continues on missing headers, unknown month names, non-numeric day text, and invalid dates. That makes the next Moray markup change look like “no bins found” instead of a scraper breakage. Raise a descriptive exception once a month/day cell is structurally wrong, and only skip known filler cells.
Suggested fix
- header = month_container.find("div", class_="month-header")
- if not header or not header.find("h2"):
- continue
+ header = month_container.find("div", class_="month-header")
+ if header is None or header.find("h2") is None:
+ raise ValueError("Moray calendar markup changed: missing month header")
month_name = header.find("h2").text.strip()
- if month_name not in month_names:
- continue
month_num = month_names.index(month_name) + 1
days_container = month_container.find("div", class_="days-container")
- if not days_container:
- continue
+ if days_container is None:
+ raise ValueError(f"Moray calendar markup changed: missing days container for {month_name}")
day_num = 0
for day_div in days_container.find_all("div"):
css_classes = day_div.get("class", [])
if "blank" in css_classes:
continue
day_text = day_div.text.strip()
- if not day_text or not day_text.isdigit():
- continue
+ if not day_text.isdigit():
+ raise ValueError(f"Unexpected Moray day cell text: {day_text!r}")
day_num = int(day_text)
...
- except ValueError:
- continue
+ except ValueError as exc:
+ raise ValueError(
+ f"Invalid Moray calendar date: {year}-{month_num}-{day_num}"
+ ) from excBased on learnings, in uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures on unexpected formats over silent defaults or swallowed errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/MorayCouncil.py` around lines 49
- 92, The loop over month_container/day_div is swallowing structural errors;
update parsing in the loop that references month_container, header.find("h2"),
month_name, days_container, day_div, day_text, and datetime(...) to raise
descriptive exceptions instead of silently continuing when the calendar markup
is unexpectedly missing or malformed (e.g., missing header/h2, unknown
month_name not in month_names, missing days_container, non-numeric day_text, or
ValueError from datetime for invalid date); continue only for known filler CSS
classes like "blank" and "day-name", and include context in the exception
message (council name, month_name, raw HTML snippet or class list, and offending
value) so failures surface as a scraper break rather than returning empty
bindata["bins"] for functions/methods that use self.BIN_TYPE_MAP,
collection_date, today, and date_format.
Moray Council removed their 7-day schedule view and replaced it with an annual calendar page. The old scraper was looking for elements that no longer exist.
Rewrote the scraper to parse the annual calendar at /my-area/bins. Each collection date is encoded in CSS classes on the calendar day cells, with the bin type determined by the class name. Parses all future dates from the current month onwards.
Tested with an IV postcode in Moray.
Summary by CodeRabbit