Skip to content

feat(parliament): surface sansad-semantic-crawler v1.0.0 discourse layer#9

Merged
skishchampi merged 1 commit into
mainfrom
feat/parliament-classifier-v2
May 12, 2026
Merged

feat(parliament): surface sansad-semantic-crawler v1.0.0 discourse layer#9
skishchampi merged 1 commit into
mainfrom
feat/parliament-classifier-v2

Conversation

@skishchampi
Copy link
Copy Markdown
Contributor

Summary

Reframes the data-page parliament section from what MPs ask to how the State responds when asked. Consumes the v1.0.0 analytical pipeline (extract-answersanalyse-discourseanalyse-ministry) and joins it into the public dataset.

Pipeline

  • Makefile — new targets: corpus-extract-answers, corpus-analyse-discourse, corpus-analyse-ministry, corpus-analyse, corpus-enrich. corpus-refresh now runs the full pipeline end-to-end.
  • scripts/build_parliament_libraries.py (new) — joins manifest.jsonl + analysis_discourse.jsonl + ministry_summary_qa.jsonl into the public JS export. Emits four new top-level keys: discourseSummary, ministryDiscourse, discourseExcerpts, rrrlfDeflections.

Surface

  • Headline stat: Of 105 classified responses to library questions, 64 (61%) were evasive.
  • "Library is a state subject" cascade: every FEDERAL_DEFLECTION response on rrrlf-tagged questions, sorted by date. The same five words appear in 1998 (HRD) and 2018 (Culture × 3) — across two decades and two political dispensations. Pairs with the existing RRRLF "two decades, a nose dive" section directly above it: Centre underfunded → Centre evaded when asked why.
  • Per-ministry evasion bars with classified-N denominators (HRD: 60% on N=15 of 83; Culture: 40% on N=15 of 67) — honest framing for the small classifier sample.
  • Taxonomy of evasion: single-column row list, one verbatim phrase per label as the hero (italic quote), one-line citation, collapsible full passage. Labels covered: REJECTED, SUBSTITUTED, FEDERAL_DEFLECTION, DEFLECTED, DATA_WITHHELD, STRUCTURAL_REFUSAL, CONSTITUTIONAL_DEFAULT, REPRESENTATIONAL_SILENCE.
  • Method note with corpus sizes + classifier name + repo link.
  • Old keyQuestions + topTags rendering demoted to a collapsible <details> block.

Followup tracked separately

Issue CommonerLLP/sansad-semantic-crawler#41 — the v1.0.0 discourse classifier is voice-blind. Adding voice/passive_ratio/agent_named lands as v1.1.0 in the upstream package; this consumer is structured to receive those fields when they arrive.

Test plan

  • CI passes (link checker + html validation)
  • After merge, load /data/#parliament:
    • Headline stat renders with red 61% and the lede sentence
    • RRRLF cascade shows the 1998 + three 2018 deflections, dates left-rail
    • Per-ministry bars render with rate + classified/total N
    • Each evasion row renders the italic pattern + cite line; "Read the passage" expands
    • Method note lists corpus stats and links the upstream repo

Reframes the data-page parliament section from "what MPs ask" to "how the
State responds when asked." Consumes the v1.0.0 analytical pipeline
(extract-answers → analyse-discourse → analyse-ministry) and joins it
into the public dataset.

- Makefile: corpus-extract-answers, corpus-analyse-discourse,
  corpus-analyse-ministry, corpus-analyse, corpus-enrich. corpus-refresh
  now chains the full pipeline.
- scripts/build_parliament_libraries.py: joins manifest +
  analysis_discourse + ministry_summary_qa into assets/parliament_libraries.js.
  Emits new top-level keys: discourseSummary, ministryDiscourse,
  discourseExcerpts, rrrlfDeflections.
- data/index.html, assets/main.js, assets/styles.css: new section
  structure — headline evasion rate, "Library is a state subject"
  cascade (FEDERAL_DEFLECTION on RRRLF-tagged questions, sorted by
  date), per-ministry evasion bars with classified-N denominators,
  taxonomy of evasion (one verbatim phrase per label, with collapsible
  full passage), method note. Old keyQuestions / topTags grid demoted
  to a collapsible details element.
- Cache-bust v=43 → v=44 across index.html, data/index.html,
  inequality/index.html.
@skishchampi skishchampi merged commit 5ba100c into main May 12, 2026
2 checks passed
@skishchampi skishchampi deleted the feat/parliament-classifier-v2 branch May 12, 2026 15:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d462c0e7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"qtype": m.get("qtype", ""),
"qno": m.get("qno", ""),
"date": m.get("date", ""),
"title": escape(m.get("title", "")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop pre-escaping titles in export builder

The builder escapes title before writing JSON, but the UI escapes the same field again in assets/main.js when rendering both the evasion rows and the RRRLF cascade. This double-escaping turns legitimate characters like & into visible entities (&amp;) whenever a parliamentary title contains special characters, so end users see corrupted text instead of the original title.

Useful? React with 👍 / 👎.

Comment thread Makefile

# Join the upstream manifest export with the v1.0.0 analytical outputs
# into a single enriched assets/parliament_libraries.js.
corpus-enrich: corpus-export
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make corpus-enrich require analysis outputs

corpus-enrich only depends on corpus-export, but build_parliament_libraries.py reads analysis_discourse.jsonl and ministry_summary_qa.jsonl and silently treats missing files as empty arrays. As a result, running make corpus-enrich (or running corpus-refresh with parallel make) can produce a “successful” artifact with zeroed discourse metrics and missing sections instead of failing fast, which makes the published dataset silently incomplete.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant