Add User-Agent to data parser for WestSuffolkCouncil#1961
Add User-Agent to data parser for WestSuffolkCouncil#1961pooley182 wants to merge 1 commit intorobbrad:masterfrom
Conversation
📝 WalkthroughWalkthroughThe change adds a custom User-Agent header to the HTTP request in the West Suffolk Council scraper. It replaces a bare Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~4 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`:
- Around line 25-27: The HTTP request using requests.get(api_url,
headers=headers) should include a timeout and should fail fast on non-2xx
responses: modify the call that produces response to pass a sensible timeout
(e.g. timeout=10) and immediately call response.raise_for_status() before
passing response.text into BeautifulSoup; update any surrounding try/except as
needed to handle requests.exceptions.Timeout/HTTPError and avoid parsing
empty/invalid content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6cc2d9f4-428a-4c52-acf8-df1cb88ff495
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py
| response = requests.get(api_url, headers=headers) | ||
|
|
||
| soup = BeautifulSoup(response.text, features="html.parser") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify this call site lacks timeout / raise_for_status in current PR state.
rg -n -C2 'requests\.get\(api_url,\s*headers=headers' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py
rg -n -C2 'raise_for_status\(' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.pyRepository: robbrad/UKBinCollectionData
Length of output: 223
Add timeout and fail fast on non-2xx responses
Line 25 performs an external HTTP call without a timeout, and the response is parsed without checking status. This can hang or silently degrade into empty results on upstream failures. Add timeout and response.raise_for_status() before parsing.
Proposed fix
- response = requests.get(api_url, headers=headers)
+ response = requests.get(api_url, headers=headers, timeout=30)
+ response.raise_for_status()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` around
lines 25 - 27, The HTTP request using requests.get(api_url, headers=headers)
should include a timeout and should fail fast on non-2xx responses: modify the
call that produces response to pass a sensible timeout (e.g. timeout=10) and
immediately call response.raise_for_status() before passing response.text into
BeautifulSoup; update any surrounding try/except as needed to handle
requests.exceptions.Timeout/HTTPError and avoid parsing empty/invalid content.
|
Closing - Just noitced someone has opened another PR to resolve. #1955 |
With the default Python user agent the West Suffolk website has begun returning a 404,
Resulting in no Bin data
{ "bins": [] }By adding a Generic windows user agent we are able to fetch the correct page data and correctly parse the bin information.
{ "bins": [ { "type": "Black bin", "collectionDate": "18/04/2026" }, { "type": "Blue bin", "collectionDate": "13/04/2026" } ] }Fixes #1959
Summary by CodeRabbit