This repo builds static HTML reports for TV shows and movie rentals. The current system is small, fragile, and scraper-driven, so future agents should optimize for accurate diagnosis, minimal churn, and end-to-end verification.
eztv.rss.jsMaintainseztv.rss.jsonfrom the EZTV RSS feed athttps://myrss.org/eztv.tvshows.puppeteer.jsReadseztv.rss.json, enriches each show from IMDb, caches details inshows.json, and writestvshows.html.
Important invariant:
eztv.rss.jsonnow contains one entry per show, not one entry per episode.- The stored episode should be the most recently seen episode for that show.
rentals.puppeteer.jsScrapes Official Charts, enriches from IMDb, and writesmovierentals.html.
telescuffShell entrypoint used for cron-style runs and optional Neocities upload.
Current modes:
./telescuff rssRefresh onlyeztv.rss.json./telescuff tvBuildtvshows.htmlfrom the existing RSS cache./telescuff moviesBuildmovierentals.html./telescuff allBuild movies + tv, but does not run the high-frequency RSS refresh
Intended cron split:
- run
rssfrequently - run
tvless frequently
eztv.rss.jsRSS ingestion and cache maintenance.eztv.rss.jsonLocal TV source-of-truth cache, one row per show.tvshows.puppeteer.jsIMDb search + title scraping for TV.shows.jsonLocal IMDb enrichment cache for TV titles.rentals.puppeteer.jsMovie rental scraper.body.ejsShared HTML template.telescuffMain wrapper script for cron/manual runs.
Legacy files exist but should generally not be used for new work:
tvshows.jsscrape.jsrentals.casper.js
eztv.rss.js:
- fetches RSS entries
- parses titles with
episode-parser - keys by normalized show name
- keeps the highest season/episode seen for that show
- updates
last_seen - evicts entries older than
--max-days
If you change this script:
- preserve the one-entry-per-show invariant
- keep the JSON shape stable unless you also update
tvshows.puppeteer.js
tvshows.puppeteer.js:
- loads
eztv.rss.json - merges any known data from
shows.json - searches IMDb for missing
url - opens IMDb title pages for missing details
- writes
shows.json - renders
tvshows.html
IMDb is currently the hardest moving part in the repo.
Observed behavior:
- search pages sometimes render normally
- search pages sometimes show a broken shell with the browser title:
Application error: a client-side exception has occurred - even in that broken state,
script#__NEXT_DATA__often still contains usable search/title data - the page console emits a lot of noisy client-side errors from IMDb itself
Current scraper strategy:
- navigate with
waitUntil: "domcontentloaded" - wait for either:
- a visible page element, or
- parseable
__NEXT_DATA__
- prefer structured JSON extraction over fragile DOM-only scraping
- keep DOM/meta/ld+json fallbacks where they add value
Do not assume:
- visible DOM means the best data is there
__NEXT_DATA__appears immediately- console errors imply scraper failure
First suspects:
- overly conservative wait strategy
- waiting for visible UI when parseable JSON is already available
- IMDb serving the broken shell page
What to check:
Navigate timing ...IMDB search timing ...- whether readiness completed via
visible-selectororjson-ready
Not every null is a selector bug.
What we have already confirmed:
- many
rating: nullcases are real unrated IMDb pages - some
descriptionanddurationgaps can be recovered from:application/ld+jsonmeta[name="description"]meta[property="og:description"]
Before changing selectors:
- inspect the live page
- determine whether IMDb actually has the value
- avoid using generic IMDb boilerplate descriptions as show descriptions
This is a search quality issue, not a title-page selector issue.
Example class of problem:
- a UK title resolving to a US series with the same/similar name
If this becomes frequent:
- improve search scoring/matching logic
- do not paper over it with title-page selector changes
That is a bug in eztv.rss.js.
Expected state:
- one JSON entry per normalized show name
For TV issues:
- Verify
eztv.rss.jsonshape first. - Confirm whether the problem is:
- RSS parsing
- IMDb search match
- IMDb title extraction
- cache reuse
- Use focused live probes rather than whole-pipeline runs when possible.
- If changing IMDb waits, measure timing before and after.
Useful commands:
node eztv.rss.js --max-days 7
node tvshows.puppeteer.js
./telescuff rss
./telescuff tvLow-cost checks:
node --check eztv.rss.js
node --check tvshows.puppeteer.js
bash -n telescuff- Prefer Puppeteer over CasperJS for new work.
- Keep changes small and behaviorally justified.
- Preserve the cron split between RSS refresh and TV HTML generation.
- Treat IMDb as unstable and verify live behavior before “fixing” selectors.
- Add debug logging only when it helps isolate timing, readiness, or data-source choice.
- When debugging scraper correctness, distinguish:
- real source-data absence
- selector breakage
- wrong-title matching
eztv.rss.jsdecides which episode is the current representative for a show.tvshows.puppeteer.jsdecides which IMDb title that show maps to and what metadata is usable.shows.jsonis a performance cache, not the source of truth.eztv.rss.jsonis the TV input source of truth.