Add User-Agent to data parser for WestSuffolkCouncil by pooley182 · Pull Request #1961 · robbrad/UKBinCollectionData

pooley182 · 2026-04-12T19:41:35Z

With the default Python user agent the West Suffolk website has begun returning a 404,

<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
<div class="content-container"><fieldset>
<h2>404 - File or directory not found.</h2>
<h3>The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.</h3>
</fieldset></div>
</div>
</body>

Resulting in no Bin data

{
    "bins": []
}

By adding a Generic windows user agent we are able to fetch the correct page data and correctly parse the bin information.

{
    "bins": [
        {
            "type": "Black bin",
            "collectionDate": "18/04/2026"
        },
        {
            "type": "Blue bin",
            "collectionDate": "13/04/2026"
        }
    ]
}

Fixes #1959

Summary by CodeRabbit

Bug Fixes
- Improved bin collection data retrieval for West Suffolk Council by enhancing request headers. This update ensures more reliable access to your collection schedules, reducing potential issues when fetching data from the council system.

coderabbitai · 2026-04-12T19:41:47Z

📝 Walkthrough

Walkthrough

The change adds a custom User-Agent header to the HTTP request in the West Suffolk Council scraper. It replaces a bare requests.get() call with one that includes browser-like headers derived from requests.utils.default_headers() to address the issue of empty bin data responses.

Changes

Cohort / File(s)	Summary
West Suffolk Council Scraper `uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`	Added custom User-Agent header to HTTP request by including headers parameter with browser-like user agent value.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

fix: add User-Agent header to KingsLynnandWestNorfolkBC scraper #1733: Adds User-Agent headers to council scraper HTTP requests, using the same pattern of modifying requests.get() calls to include browser-like headers for API compatibility.

Poem

🐰 A header here, a header there,
User-Agent floating through the air,
West Suffolk's bins now speak once more,
No empty arrays at the door! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a User-Agent header to the WestSuffolkCouncil data parser to resolve the 404 issue.
Linked Issues check	✅ Passed	The PR directly addresses issue `#1959` by adding a User-Agent header to fix the 404 responses from West Suffolk's website, restoring bin data retrieval.
Out of Scope Changes check	✅ Passed	All changes are scoped to WestSuffolkCouncil.py and directly address the linked issue with no extraneous modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py`:
- Around line 25-27: The HTTP request using requests.get(api_url,
headers=headers) should include a timeout and should fail fast on non-2xx
responses: modify the call that produces response to pass a sensible timeout
(e.g. timeout=10) and immediately call response.raise_for_status() before
passing response.text into BeautifulSoup; update any surrounding try/except as
needed to handle requests.exceptions.Timeout/HTTPError and avoid parsing
empty/invalid content.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cc2d9f4-428a-4c52-acf8-df1cb88ff495

📥 Commits

Reviewing files that changed from the base of the PR and between 60bd3cc and 4b2474b.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

coderabbitai · 2026-04-12T19:43:54Z

uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

+        response = requests.get(api_url, headers=headers)

        soup = BeautifulSoup(response.text, features="html.parser")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify this call site lacks timeout / raise_for_status in current PR state. rg -n -C2 'requests\.get\(api_url,\s*headers=headers' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py rg -n -C2 'raise_for_status\(' uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py

Repository: robbrad/UKBinCollectionData

Length of output: 223

Add timeout and fail fast on non-2xx responses

Line 25 performs an external HTTP call without a timeout, and the response is parsed without checking status. This can hang or silently degrade into empty results on upstream failures. Add timeout and response.raise_for_status() before parsing.

Proposed fix

- response = requests.get(api_url, headers=headers) + response = requests.get(api_url, headers=headers, timeout=30) + response.raise_for_status()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/WestSuffolkCouncil.py` around lines 25 - 27, The HTTP request using requests.get(api_url, headers=headers) should include a timeout and should fail fast on non-2xx responses: modify the call that produces response to pass a sensible timeout (e.g. timeout=10) and immediately call response.raise_for_status() before passing response.text into BeautifulSoup; update any surrounding try/except as needed to handle requests.exceptions.Timeout/HTTPError and avoid parsing empty/invalid content.

pooley182 · 2026-04-12T19:44:14Z

Closing - Just noitced someone has opened another PR to resolve. #1955

Add User-Agent to data parser for WestSuffolkCouncil

4b2474b

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

pooley182 closed this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add User-Agent to data parser for WestSuffolkCouncil#1961

Add User-Agent to data parser for WestSuffolkCouncil#1961
pooley182 wants to merge 1 commit intorobbrad:masterfrom
pooley182:master

pooley182 commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

pooley182 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		response = requests.get(api_url, headers=headers)

		soup = BeautifulSoup(response.text, features="html.parser")

Conversation

pooley182 commented Apr 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

pooley182 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pooley182 commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading