fix: SouthStaffordshireDistrictCouncil - use objectId query param and new parse#1957
fix: SouthStaffordshireDistrictCouncil - use objectId query param and new parse#1957InertiaUK wants to merge 7 commits intorobbrad:masterfrom
Conversation
Torridge's SOAP API changed its response from explicit dates ("Mon 14 Apr")
to relative phrases ("Tomorrow then every Mon", "Today then every Tue")
with an embedded calendar table. The old regex-based parser returned empty
because it expected the old format.
The rewritten parser handles Today/Tomorrow/weekday-name phrases, falls back
to the old explicit format if it reappears, and gracefully skips "No X
collection for this address" entries.
The Wyre bin collection page now includes a 'Download calendar' box among the .boxed divs. This box has no h3.bin-collection-tasks__heading, which caused the scraper to crash on .text access. Changes: - Skip boxes missing the heading or content container - Use regex to extract the bin name from 'Your next X collection' - Collapse whitespace on the date text before strptime (p tags produced multi-whitespace runs after .text) - Roll year forward only when the computed date is in the past, instead of only handling the December->January edge case
…e results The checkyourbinday page is now: - Gated by Cloudflare Turnstile (requires non-headless UC) - Using WASTECOLLECTIONCALENDARV7_* element IDs (was V5) - Rendering collection dates inline as .gi-summary-blocklist__row divs after address selection — no separate NEXT/submit step Changes: - Update all element IDs from V5 to V7 - Add Cloudflare challenge wait loop (up to 50s) - Dismiss cookie consent before interaction (was blocking button clicks) - Replace old table-based parsing with .gi-summary-blocklist__row scraping - Use Select.select_by_visible_text with stale-retry instead of manual option.click() loop (which crashed on AJAX re-renders) - Remove the smart_select_address helper's dead fuzzy/strict split
…anner Stockton's AchieveForms form name changed from LOOKUPBINDATESBYADDRESSSKIPOUTOFREGION to LOOKUPBINDATESBYADDRESSSKIPOUTOFREGIONV2, so every element ID needed the V2 suffix added. Also added a cookie-banner dismissal step — the banner was covering the search button and intercepting clicks.
The MyWestSuffolk.aspx IIS endpoint returns a 404 page to requests without a User-Agent header. Adding a realistic Chrome UA restores the full response (57KB), letting the existing parser pick up the bin collection panel correctly.
The directory_search.php postcode endpoint the old scraper used is gone; Midlothian migrated to a MyMidlothian Granicus fillform iframe at my.midlothian.gov.uk. Rewrote the scraper to: - Load the Bin_Collection_Dates service page - Switch into the fillform-frame-1 iframe - Fill the postcode field (dropdown auto-populates on change) - Select the matching address — this auto-fills six per-bin date fields (dateRecycling, dateFood, dateGarden, dateCard, dateGlass, dateResidual) - Read the dates directly from those fields, parse them as dd/mm/yyyy - No submit button needed The fillform iframe detects headless Chrome and vanilla Selenium and refuses to populate the dropdown, so when a DISPLAY is available the scraper now uses undetected_chromedriver in non-headless mode.
…ew parse The old where-i-live?uprn= query parameter is a placeholder that always returns the 'van collection' fallback message. The real bin calendar is served by where-i-live?objectId= (same UPRN value, different query key). This only became visible after tracing the ajax_form=1 postcode lookup flow on /viewyourcollectioncalendar — the form POST ultimately redirects to /where-i-live?objectId=<UPRN>. Changes: - Scraper now does its own GET with objectId rather than relying on the framework's pre-fetched page (which used the wrong query key). - Parses the new result structure: a 'next collection' summary card (p.collection-date / p.collection-type) plus a subsequent-collections leisure-table with 'Day, D Month YYYY' date strings. - Splits composite bin types like 'Recycling & Garden waste' into separate entries for each underlying bin. - De-dupes the summary/table overlap. Test fixture in input.json needs updating to a real residential UPRN (e.g. 100031802117 = 1 Kempson Road, ST19 5BG) — the previous test UPRN 200004523954 is a genuine van-collection property.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 27 minutes and 46 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
What this changes
This is a sneaky regression. The scraper uses the URL
where-i-live?uprn=<UPRN>, which looks reasonable and returns a 200 OK with adiv#showCollectionDates— but the response is always the fallback message:The existing scraper correctly detects this fallback and returns an empty
binslist, so every address looks like a van-collection property.Root cause
The
?uprn=query parameter is a deprecated placeholder. The real calendar is now served bywhere-i-live?objectId=<UPRN>— same UPRN value, different query key. I only found this by tracing the AJAX flow on the public form at/viewyourcollectioncalendar, which submits toajax_form=1with the postcode and returns an address dropdown. Picking an address submits the form, which redirects towhere-i-live?objectId=<UPRN>— and that page contains the real bin data.What the new scraper does
requests.getwith?objectId=<UPRN>and a browser User-Agent — the framework's pre-fetch on the old URL can't be re-used because the query key is wrong.p.collection-date+p.collection-typefor the "next collection" summary card at the top ofdiv#showCollectionDates.table.leisure-tableunderneath for subsequent collections."Tuesday, 14 April 2026"(comma-separated, not"Tue 14 April 2026")."Recycling & Garden waste"into separateRecyclingandGarden wasteentries, since those are physically different bins.Test fixture update
The previous fixture UPRN
200004523954is a genuine van-collection property (confirmed by hitting the newobjectId=URL), so it would still look empty to any reader running the tests. Bumping it to100031802117(1 Kempson Road, Penkridge ST19 5BG) — a real residential property with a full fortnightly rotation. Theurlfield is updated to the new query param and thewiki_noteexplains the?uprn=vs?objectId=gotcha for anyone copying the URL pattern.Test
Verified via
collect_dataagainst the new fixture: