Skip to content

Add User-Agent to data parser for WestSuffolkCouncil#1961

Closed
pooley182 wants to merge 1 commit intorobbrad:masterfrom
pooley182:master
Closed

Add User-Agent to data parser for WestSuffolkCouncil#1961
pooley182 wants to merge 1 commit intorobbrad:masterfrom
pooley182:master

Conversation

@pooley182
Copy link
Copy Markdown

@pooley182 pooley182 commented Apr 12, 2026

With the default Python user agent the West Suffolk website has begun returning a 404,

<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
<div class="content-container"><fieldset>
<h2>404 - File or directory not found.</h2>
<h3>The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.</h3>
</fieldset></div>
</div>
</body>

Resulting in no Bin data

{
    "bins": []
}

By adding a Generic windows user agent we are able to fetch the correct page data and correctly parse the bin information.

{
    "bins": [
        {
            "type": "Black bin",
            "collectionDate": "18/04/2026"
        },
        {
            "type": "Blue bin",
            "collectionDate": "13/04/2026"
        }
    ]
}

Fixes #1959

Summary by CodeRabbit

  • Bug Fixes
    • Improved bin collection data retrieval for West Suffolk Council by enhancing request headers. This update ensures more reliable access to your collection schedules, reducing potential issues when fetching data from the council system.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

The change adds a custom User-Agent header to the HTTP request in the West Suffolk Council scraper. It replaces a bare requests.get() call with one that includes browser-like headers derived from requests.utils.default_headers() to address the issue of empty bin data responses.

Changes

Cohort / File(s) Summary
West Suffolk Council Scraper
uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py
Added custom User-Agent header to HTTP request by including headers parameter with browser-like user agent value.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

Poem

🐰 A header here, a header there,
User-Agent floating through the air,
West Suffolk's bins now speak once more,
No empty arrays at the door! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a User-Agent header to the WestSuffolkCouncil data parser to resolve the 404 issue.
Linked Issues check ✅ Passed The PR directly addresses issue #1959 by adding a User-Agent header to fix the 404 responses from West Suffolk's website, restoring bin data retrieval.
Out of Scope Changes check ✅ Passed All changes are scoped to WestSuffolkCouncil.py and directly address the linked issue with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`:
- Around line 25-27: The HTTP request using requests.get(api_url,
headers=headers) should include a timeout and should fail fast on non-2xx
responses: modify the call that produces response to pass a sensible timeout
(e.g. timeout=10) and immediately call response.raise_for_status() before
passing response.text into BeautifulSoup; update any surrounding try/except as
needed to handle requests.exceptions.Timeout/HTTPError and avoid parsing
empty/invalid content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cc2d9f4-428a-4c52-acf8-df1cb88ff495

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and 4b2474b.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

Comment on lines +25 to 27
response = requests.get(api_url, headers=headers)

soup = BeautifulSoup(response.text, features="html.parser")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify this call site lacks timeout / raise_for_status in current PR state.
rg -n -C2 'requests\.get\(api_url,\s*headers=headers' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py
rg -n -C2 'raise_for_status\(' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 223


Add timeout and fail fast on non-2xx responses

Line 25 performs an external HTTP call without a timeout, and the response is parsed without checking status. This can hang or silently degrade into empty results on upstream failures. Add timeout and response.raise_for_status() before parsing.

Proposed fix
-        response = requests.get(api_url, headers=headers)
+        response = requests.get(api_url, headers=headers, timeout=30)
+        response.raise_for_status()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` around
lines 25 - 27, The HTTP request using requests.get(api_url, headers=headers)
should include a timeout and should fail fast on non-2xx responses: modify the
call that produces response to pass a sensible timeout (e.g. timeout=10) and
immediately call response.raise_for_status() before passing response.text into
BeautifulSoup; update any surrounding try/except as needed to handle
requests.exceptions.Timeout/HTTPError and avoid parsing empty/invalid content.

@pooley182
Copy link
Copy Markdown
Author

Closing - Just noitced someone has opened another PR to resolve. #1955

@pooley182 pooley182 closed this Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

West Suffolk Council stopped retrieving data

1 participant