Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 31 additions & 175 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,177 +1,33 @@
# Changelog

This file is append-only. Entries record public project changes that are
too detailed for the README but useful for maintainers, reviewers, and
future contributors.

## 2026-05-08 — UI Redesign & Article 16 Political Palette

The listing card UI has been redesigned to improve scannability and accessibility.
- **Typography & Hierarchy**: Increased institution font size to 28px and adjusted line-heights to establish a clearer visual order. The listings column is now capped at 740px to improve line length for readability.
- **Article 16 Palette**: Shifted the reservation status pill to a Bahujan-political color register:
- **Ambedkar Blue**: Confirmed roster disclosure or Special Recruitment Drive (SRD).
- **Saffron**: Institutional exclusion / No reservation (Private Universities).
- **Grey**: Unclear or undisclosed status.
- **Accessibility**: Increased tap areas for the star/save buttons (WCAG 2.5.5 AA) and re-anchored footer popovers to prevent viewport overflow.

## 2026-05-08 — `sansad-semantic-crawler` bumped to v0.4.0

`requirements.txt` now pins the upstream crawler at
[`v0.4.0`](https://github.com/CommonerLLP/sansad-semantic-crawler/releases/tag/v0.4.0).
This release automates the **politician enrichment layer**:

- **Automated Party/State Lookup**: Question records now include an `asker_details`
block (party, party_name, state, house) pulled from the latest official
member lists. No more manual party-mapping in `consolidate_corpus.py`.
- **Committee Composition Rosters**: The crawler can now fetch the full
roster of parliamentary standing committees using a hybrid API and
PDF/LLM strategy. This enables tracking how committee membership
(and the political balance within) changes from one report to the next.
- **Refactored Base Architecture**: Improved provenance tracking and
PDF sanity checks.

This bump is **behaviour-preserving**: the single-schema assumption for
questions remains intact, but manifests will now contain richer metadata
by default.

## 2026-05-06 — `sansad-semantic-crawler` bumped to v0.2.0

`requirements.txt` now pins the upstream crawler at
[`v0.2.0`](https://github.com/CommonerLLP/sansad-semantic-crawler/releases/tag/v0.2.0)
(was `v0.1.0`). The new release ships a **pluggable-classifier
architecture** — regex (default, back-compat), embeddings (Sentence
Transformers anchor similarity), LLM (OpenAI-compat chat-completions
JSON tagging), or ensemble (combine modes). Optional pip extras:
`[embeddings]`, `[llm]`, `[all]`. The package never ships model
weights; users supply their own runtime (Ollama, vLLM, llama.cpp,
mlx-lm, transformers, or any OpenAI-compatible hosted service).

This bump is **behaviour-preserving for this project**:
`notes/topics/cei-vacancies.json` does not declare a `classifier`
block, so v0.2.0 transparently falls back to the regex classifier
that drove every Gap chart prior to today. Adopting embeddings or
LLM modes is a separate, opt-in editorial decision.

`make test` (153 + 9 skipped) and `npm test` (126 across 11 files)
unaffected. The bump touches one line in `requirements.txt`.

## 2026-05-06 — Parliamentary-corpus crawler extracted and externalised

The three legacy scripts that built *The Gap*'s parliamentary corpus
— `scripts/sansad_crawl.py` (291 lines), `scripts/sansad_rs_crawl.py`
(215 lines), and `scripts/sansad_download_pdfs.py` (125 lines) — are
**retired**. Their behaviour (LS DSpace API + RS rsdoc.nic.in API +
PDF discovery + dedup-on-resume) now lives in a separately-released
public-good package, **`sansad-semantic-crawler`**, hosted at
[github.com/CommonerLLP/sansad-semantic-crawler](https://github.com/CommonerLLP/sansad-semantic-crawler)
and pinned at `v0.1.0` in `requirements.txt`.

The package is config-driven: it expects a topic-profile JSON that
encodes search groups, ministry filters, and tag rules. The faculty-
vacancy / reservation / Mission-Mode lens that drove the legacy
scripts is now `notes/topics/cei-vacancies.json` — gitignored, since
the analytical lens is project-specific and the public package ships
only a `libraries.json` example for `theright2read`.

What the host project picks up in exchange:

- **One canonical schema for both houses.** The legacy LS manifest
used `questiontype` / `questionno` / `members`; the legacy RS
manifest used `qtype` / `qno` / `asker`. The package emits
`qtype` / `qno` / `askers` directly for both houses.
`scripts/consolidate_corpus.py` is rewritten to consume that single
schema, which dropped roughly half its branching logic.
- **Stable `key` field on every record** (`LS|U|178|2024-11-25` /
`RS|S|365|2025-07-23`), so dedup is no longer a per-script
computation. This was always how `consolidate_corpus.py`
internally normalised — it is now a guaranteed property of the
upstream manifest.
- **Re-usable PDF naming** — the package writes LS PDFs to
`data/_sansad_crawl/pdfs/ls/` and RS PDFs to
`.../pdfs/rs/`. Filenames match the legacy convention
(`{qtype-letter}{qno}_{slug}.pdf`), so existing PDFs can be moved
into the new tree without re-download if the maintainer chooses
to skip the full re-crawl.

Operational entry points: `make corpus-refresh` (full pipeline:
crawl → parse → consolidate). See `make help` for the per-step
targets.

## 2026-05-06 — Test counts + repo-layout refresh

The 2026-05-05 entry below describes a frontend test floor of 81 Vitest
tests across 4 files (`sanitize`, `classify`, `excerpt`, `schema`) and
notes that 5 lib modules still lacked coverage. Both numbers are
superseded:

- **Python**: 153 tests + 9 skipped (was 119).
- **Vitest**: 117 tests across 11 files (was 81 across 4).
- 11 of 13 `docs/lib/` modules now have at least smoke / contract
coverage: `sanitize`, `classify`, `excerpt`, `schema`,
`current-validator`, `card-helpers`, `render-card`, `filters`,
`map`, `render-tabs`, `search`. The two without dedicated unit
tests are `charts.js` (chart data + Resources-tab payload) and
`state.js` (a thin shared mutable-state holder); both are exercised
indirectly by the higher-level tests but warrant direct contracts
next time they're touched.

The Repository-layout block is also updated above to reflect the
public-tree files added since the original was written:
`docs/lib/`, `docs/MISTAKES.md`, `docs/PARSER-ARCHITECTURE.md`,
`LICENSE`, `CITATION.cff`, `package.json`, `tests/`. The orphaned
`TECHDEBT.md` line is removed: that file is part of the maintainer's
private working notes (`/notes/` is gitignored), not the public
tree, so the original layout entry was always pointing at a path
that GitHub never sees.

## 2026-05-06 — Project relicensed to non-commercial terms

The Licence section above is rewritten as of this date. Prior to
2026-05-06 the project shipped under MIT (code) and CC BY-SA 4.0
(data); both permitted commercial use. From 2026-05-06 forward,
the project is non-commercial source-available: PolyForm
Noncommercial 1.0.0 for code, CC BY-NC-SA 4.0 for data and corpus.

The change is not retroactive against existing users — both MIT and
CC BY-SA 4.0 are perpetual for any recipient who exercised rights
under them. New copies of the code and data going forward are
governed by the new terms.

The intent: this work is funded indirectly by the Indian public,
exists to surface a public-interest argument, and should never
become a commercial product — anyone's, including the maintainer's.
The site footer and the colophon disclaimer on every page are
updated to match. A `LICENSE` file with the canonical PolyForm
text and a `CITATION.cff` are added at the repository root.

## 2026-05-05 — Phase 2 frontend refactor

The "Repository layout" block above describes `docs/app.js` as "all
SPA logic" — that was true at the time of writing. As of commit
`2d50c7c`, `app.js` is **728 lines of orchestration only** (imports,
`loadData`, `render()`, tab routing, event wiring); the bulk of the
SPA logic now lives in **9 ESM modules under `docs/lib/`**:

| Module | Purpose |
|---|---|
| `lib/sanitize.js` | `escapeHTML` / `safeUrl` / URL allowlist *(tested)* |
| `lib/schema.js` | Zod schemas for runtime + test-time validation *(tested)* |
| `lib/classify.js` | Field tags / position rank / listing quality *(tested)* |
| `lib/excerpt.js` | `raw_text_excerpt` sanitiser *(tested)* |
| `lib/charts.js` | Vacancies tab + The Gap charts + resources data |
| `lib/state.js` | Shared mutable state holder (`state.ADS`, `state.SAVED`, etc.) |
| `lib/card-helpers.js` | Per-card cue extractors and rank/discipline formatters |
| `lib/render-card.js` | `renderAd()` + hiring-trap detection + card wiring |
| `lib/filters.js` | Filter / sort / search + reactive facet counts |
| `lib/map.js` | Leaflet init + marker updates |
| `lib/render-tabs.js` | Resources / Saved / Coverage tab renderers |

Frontend tests live under `tests/` (Vitest); 81 tests across 4 files
covering `sanitize`, `classify`, `excerpt`, and `schema`. Run with
`npm test`. The remaining 5 lib modules ship without unit tests yet —
backfilling those is deliberate next-step work.

The "Running it locally" `make test` command above runs the 119-test
**Python** scraper suite. Frontend tests run separately via `npm test`.
Both should be green before opening a PR that touches their respective
trees.
## 2026-05-10
- **feat: comprehensive accessibility remediation for whoseuniversity.org**
- Implemented findings from the design audit to ensure the forensic record is accessible to researchers using assistive technologies.
- **Semantic Navigation:** Corrected heading hierarchy in "The Gap" report.
- **ARIA Labeling:** Added `aria-label` to search inputs and descriptive `<title>` tags to SVGs.
- **Focus Management:** Updated filter popovers to capture focus on open and return it on close.
- **Map Accessibility:** Enabled keyboard navigation for Leaflet markers.
- **feat: custom SVG map markers with institution-specific icons**
- Replaced `L.circleMarker` with custom `L.divIcon` SVG pins for improved visual hierarchy.
- Assigned unique icons for Universities (🎓), Technical HEIs (⚙️), and Management Institutions (📊).
- Implemented dynamic 'Active' state styling.
- **feat: accessible data-pill map markers with representative palette**
- Implemented Airbnb-style 'Data Pills' showing the number of jobs.
- Added institutional symbols to active pills for color-blind accessibility.
- Applied representative color palette: Saffron for IIM/Private, Light Blue for IIT/IISc, Ambedkar Blue for Central Universities.
- **feat: Airbnb-style marker clustering**
- Integrated `Leaflet.markercluster` to bunch markers at low zoom levels.
- Implemented layered discovery: National -> Regional -> Institutional.
- Enhanced cluster pills to show both institutional count and total job count.
- **fix: test suite hardening**
- Centralized `localStorage` mock in `tests/setup.js` to resolve conflicting mocks.
- Created `vitest.config.js` to standardize the test environment.

## 2026-05-09
- **chore: security posture and privacy purge**
- Rotated keys, purged git history, and enforced local-only policy for `CLAUDE.md`, `AGENTS.md`, and `MISTAKES.md`.
- Updated `.gitignore` across all repos.
- **docs: unified READMEs and handoff protocols**
- Consolidated per-repo instructions into a single `_org/` source of truth.

... (older entries)
7 changes: 6 additions & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@
<meta name="twitter:image" content="https://whoseuniversity.org/og.png" />

<link rel="stylesheet" href="https://unpkg.com/leaflet@1.9.4/dist/leaflet.css" />
<link rel="stylesheet" href="https://unpkg.com/leaflet.markercluster@1.4.1/dist/MarkerCluster.css" />
<link rel="stylesheet" href="https://unpkg.com/leaflet.markercluster@1.4.1/dist/MarkerCluster.Default.css" />

<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<!-- Typography: switched the display face from Source Serif 4 (the Google
Expand All @@ -50,6 +53,7 @@
script-src locked to 'self' without 'unsafe-inline'. -->
<script src="theme-init.js"></script>
<script src="https://unpkg.com/leaflet@1.9.4/dist/leaflet.js"></script>
<script src="https://unpkg.com/leaflet.markercluster@1.4.1/dist/leaflet.markercluster.js"></script>
<link rel="stylesheet" href="styles.css" />
</head>
<body>
Expand Down Expand Up @@ -108,7 +112,7 @@ <h1><span class="masthead-title">Whose University?</span><span class="masthead-s
<span class="strip-label">Filter:</span>
<label class="filter-search" for="search">
<span class="search-ico" aria-hidden="true">⌕</span>
<input id="search" type="search" value="" autocomplete="off" placeholder="Search ML, postdoc, STS, policy…" />
<input id="search" type="search" value="" autocomplete="off" placeholder="Search ML, postdoc, STS, policy…" aria-label="Search advertisements" />
</label>

<!-- Reserved posts toggle: a first-class chip for the dashboard's most
Expand Down Expand Up @@ -342,6 +346,7 @@ <h3 class="about-h3">Corrections</h3>
Open an issue at
<a class="github-link" href="https://github.com/commonerllp/academiaindia" target="_blank" rel="noopener" aria-label="GitHub repository: commonerllp/academiaindia">
<svg class="github-icon" viewBox="0 0 16 16" aria-hidden="true" focusable="false">
<title>GitHub</title>
<path d="M8 0C3.58 0 0 3.67 0 8.2c0 3.62 2.29 6.69 5.47 7.77.4.08.55-.18.55-.4 0-.19-.01-.84-.01-1.53-2.01.38-2.53-.5-2.69-.96-.09-.24-.48-.96-.82-1.16-.28-.16-.68-.56-.01-.57.63-.01 1.08.59 1.23.83.72 1.24 1.87.89 2.33.68.07-.53.28-.89.51-1.09-1.78-.21-3.64-.91-3.64-4.04 0-.89.31-1.62.82-2.19-.08-.21-.36-1.04.08-2.16 0 0 .67-.22 2.2.84A7.4 7.4 0 0 1 8 3.95c.68 0 1.36.09 2 .28 1.53-1.06 2.2-.84 2.2-.84.44 1.12.16 1.95.08 2.16.51.57.82 1.3.82 2.19 0 3.14-1.87 3.83-3.65 4.04.29.25.54.76.54 1.54 0 1.11-.01 2-.01 2.27 0 .22.15.48.55.4A8.09 8.09 0 0 0 16 8.2C16 3.67 12.42 0 8 0Z" />
</svg>
<span>GitHub</span>
Expand Down
7 changes: 4 additions & 3 deletions docs/lib/card-helpers.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ import { safeUrl, resolveUrl, escapeRegExp } from "./sanitize.js";

/** Type-to-colour map used by the card type chip and the map markers. */
export const TYPE_COLORS = {
IIT: "#1F4E79", IIM: "#2d6a4f", IISc: "#6b21a8", IISER: "#b45309",
NIT: "#64748b", IIIT: "#0e7490", CentralUniversity: "#92400e",
IIT: "#58a6ff", IIM: "#F47C20", IISc: "#58a6ff", IISER: "#58a6ff",
NIT: "#64748b", IIIT: "#0e7490", CentralUniversity: "#000080",
StateUniversity: "#9a3412",
PrivateUniversity: "#7c3aed",
PrivateUniversity: "#F47C20",
};


export function detectAdCampus(ad) {
const text = `${ad.title || ""} ${ad.raw_text_excerpt || ""}`;
if (!text) return null;
Expand Down
Loading
Loading