feat(parliament): surface sansad-semantic-crawler v1.0.0 discourse layer#9
Conversation
Reframes the data-page parliament section from "what MPs ask" to "how the State responds when asked." Consumes the v1.0.0 analytical pipeline (extract-answers → analyse-discourse → analyse-ministry) and joins it into the public dataset. - Makefile: corpus-extract-answers, corpus-analyse-discourse, corpus-analyse-ministry, corpus-analyse, corpus-enrich. corpus-refresh now chains the full pipeline. - scripts/build_parliament_libraries.py: joins manifest + analysis_discourse + ministry_summary_qa into assets/parliament_libraries.js. Emits new top-level keys: discourseSummary, ministryDiscourse, discourseExcerpts, rrrlfDeflections. - data/index.html, assets/main.js, assets/styles.css: new section structure — headline evasion rate, "Library is a state subject" cascade (FEDERAL_DEFLECTION on RRRLF-tagged questions, sorted by date), per-ministry evasion bars with classified-N denominators, taxonomy of evasion (one verbatim phrase per label, with collapsible full passage), method note. Old keyQuestions / topTags grid demoted to a collapsible details element. - Cache-bust v=43 → v=44 across index.html, data/index.html, inequality/index.html.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d462c0e7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "qtype": m.get("qtype", ""), | ||
| "qno": m.get("qno", ""), | ||
| "date": m.get("date", ""), | ||
| "title": escape(m.get("title", "")), |
There was a problem hiding this comment.
Stop pre-escaping titles in export builder
The builder escapes title before writing JSON, but the UI escapes the same field again in assets/main.js when rendering both the evasion rows and the RRRLF cascade. This double-escaping turns legitimate characters like & into visible entities (&) whenever a parliamentary title contains special characters, so end users see corrupted text instead of the original title.
Useful? React with 👍 / 👎.
|
|
||
| # Join the upstream manifest export with the v1.0.0 analytical outputs | ||
| # into a single enriched assets/parliament_libraries.js. | ||
| corpus-enrich: corpus-export |
There was a problem hiding this comment.
Make corpus-enrich require analysis outputs
corpus-enrich only depends on corpus-export, but build_parliament_libraries.py reads analysis_discourse.jsonl and ministry_summary_qa.jsonl and silently treats missing files as empty arrays. As a result, running make corpus-enrich (or running corpus-refresh with parallel make) can produce a “successful” artifact with zeroed discourse metrics and missing sections instead of failing fast, which makes the published dataset silently incomplete.
Useful? React with 👍 / 👎.
Summary
Reframes the data-page parliament section from what MPs ask to how the State responds when asked. Consumes the v1.0.0 analytical pipeline (
extract-answers→analyse-discourse→analyse-ministry) and joins it into the public dataset.Pipeline
Makefile— new targets:corpus-extract-answers,corpus-analyse-discourse,corpus-analyse-ministry,corpus-analyse,corpus-enrich.corpus-refreshnow runs the full pipeline end-to-end.scripts/build_parliament_libraries.py(new) — joinsmanifest.jsonl+analysis_discourse.jsonl+ministry_summary_qa.jsonlinto the public JS export. Emits four new top-level keys:discourseSummary,ministryDiscourse,discourseExcerpts,rrrlfDeflections.Surface
FEDERAL_DEFLECTIONresponse onrrrlf-tagged questions, sorted by date. The same five words appear in 1998 (HRD) and 2018 (Culture × 3) — across two decades and two political dispensations. Pairs with the existing RRRLF "two decades, a nose dive" section directly above it: Centre underfunded → Centre evaded when asked why.keyQuestions+topTagsrendering demoted to a collapsible<details>block.Followup tracked separately
Issue CommonerLLP/sansad-semantic-crawler#41 — the v1.0.0 discourse classifier is voice-blind. Adding
voice/passive_ratio/agent_namedlands as v1.1.0 in the upstream package; this consumer is structured to receive those fields when they arrive.Test plan
/data/#parliament: