A public-interest tracker of academic work in India.
Live site: whoseuniversity.org
The site has two registers running in parallel:
-
Vacancies. A live feed of advertised faculty, visiting, and postdoctoral positions across Central Universities, the IITs, IIMs, NITs, IIITs, AIIMS, and major private universities. Scraped from each institution's career page, occasionally PDF-parsed from notification documents, and (rarely) hand-transcribed from recruitment cards circulated outside the institutions' own websites. Every listing carries a provenance grade and a reservation row that names what the institution has — and has not — disclosed.
-
The Gap. A research treatment of the parliamentary record on faculty vacancy in centrally-funded higher-education institutions from September 2020 to March 2026. 546 questions tabled across the Lok Sabha and Rajya Sabha; one question (RS Q.365 of 23 July 2025, asked by the Leader of Opposition) surfaced rank-by-category vacancy data; the rest were answered with cumulative "Mission Mode" recruitment counters that conflate appointment with vacancy. The Gap names this substitution and the institutions that perform it.
The full project framing, methodology, disclaimers, and citation are on the About page.
docs/data/institutions_registry.json ← the registry of ~185 institutions
│
▼
┌─────────────────────┐
│ scraper/run.py │ per-site parsers in scraper/parsers/, an HTTP
│ (orchestrator) │ cache, an SSRF guard, a daily archive snapshot
└──────────┬──────────┘
▼
docs/data/current.json ← GitHub Pages serves this
docs/data/coverage_report.json directly to the static SPA
docs/data/archive/YYYY-MM-DD.json
│
▼
docs/index.html + docs/styles.css + docs/app.js + docs/lib/ ← the SPA
│
▼
whoseuniversity.org (GitHub Pages)
The whole site is static. There is no backend, no database, no auth, no user accounts, no analytics. The scraper writes JSON; GitHub Pages serves JSON; the browser renders JSON.
The Makefile is the canonical entry point.
# install pinned dependencies
make deps
# scrape all institutions in the registry, write to docs/data/
make scrape
# scrape a 5-institution smoke test
make scrape ARGS='--limit 5'
# wipe the PDF cache before scraping (for ads that were updated upstream)
make scrape-fresh
# run the Python scraper test suite
make test
# run the frontend Vitest suite (127 tests across 11 files)
npm test
# serve docs/ locally for development
make serve # → http://localhost:8000/
# apply 30-day retention to docs/data/archive/
make prune-archiveThe scraper writes its output to docs/data/, which is what GitHub
Pages serves; there is no copy step. The weekly-sweep
workflow runs the same make scrape
on a GitHub Actions cron at 03:30 IST every Monday, validates
docs/data/current.json, and opens a data-update PR instead of pushing
directly to main.
whoseuniversity/
├── docs/ ← what GitHub Pages serves
│ ├── index.html ← markup + theme-flash script
│ ├── styles.css ← all stylesheets
│ ├── app.js ← SPA orchestration: data load, render, tab routing, events
│ ├── lib/ ← ESM modules for cards, filters, charts, schema, search, map
│ ├── favicon.svg ← oxblood "?" mark
│ ├── og.svg / og.png ← social-share card
│ └── data/
│ ├── current.json ← live listings (the SPA reads this)
│ ├── coverage_report.json ← which parsers worked
│ ├── institutions_registry.json
│ └── vacancy_snapshots.json ← the parliamentary corpus's structured data
│
├── scraper/ ← Python; runs locally or on Actions
│ ├── run.py ← orchestrator
│ ├── schema.py ← Pydantic JobAd / Institution / enums
│ ├── ad_factory.py ← canonical JobAd builder + stable_id
│ ├── url_safety.py ← shared SSRF guard (33 unit tests)
│ ├── fetch.py ← HTTP layer with cache
│ ├── pdf_extractor.py ← PDF → text
│ ├── prune_archive.py ← retention helper
│ └── parsers/ ← one file per institution / institution-family
│
├── scripts/ ← analytical helpers
├── .github/workflows/ ← weekly-sweep CI
├── Makefile ← entry points
├── requirements.txt ← pinned
├── package.json ← Vitest dev-tooling only; the site ships zero npm packages
├── tests/ ← Vitest suite for `docs/lib/` modules
├── CONTRIBUTING.md ← parser contract + contribution priorities
├── CHANGELOG.md ← dated project changes
├── LICENSE ← PolyForm Noncommercial 1.0.0
├── CITATION.cff ← machine-readable citation metadata
└── README.md ← this file
The parliamentary PDFs that drive The Gap and the OCR text extracts
of those PDFs are not in this repository. They are public records
on sansad.in (Lok Sabha at elibrary.sansad.in, Rajya Sabha at
rsdoc.nic.in); the analysis is open and the bibliography on The Gap
links each chart to its source. If you want the consolidated corpus
for your own research, write to the maintainer.
The site grows by adding institutions. The fastest way to contribute is to add or fix a parser for an institution that's not yet covered.
Two contribution areas are especially urgent:
-
Central Universities via Samarth (
curec.samarth.ac.in). Most Central Universities have moved their recruitment listings onto the Samarth eGov public-search portal. Coverage of Central Universities in this tracker is currently thin because each university's site doesn't always carry the listing — Samarth does. A parser that hitscurec.samarth.ac.inand returns oneJobAdper public listing would unblock dozens of institutions in one shot. -
State-government universities. The site currently covers centrally-funded HEIs (CUs, IITs, IIMs, NITs, IIITs, AIIMS) and a handful of major private universities. State-government universities — Andhra University, Anna University, Calcutta University, Jadavpur University, University of Madras, Mumbai University, etc. — are absent. They serve far more students than the central system does and are systematically excluded from the parliamentary disclosure regime The Gap analyses, which makes their hiring practices doubly opaque. Adding state universities expands the site's reach substantially.
See CONTRIBUTING.md for the parser contract, testing setup, and a priority queue.
The site reads as journalism, not as a service. The political register — evidence-grounded, naming institutions by name — is deliberate. Pull requests that soften analysis, generalise to "stakeholders" or "marginalised communities" instead of naming SC, ST, OBC, EWS, and PwBD candidates specifically, or that try to balance The Gap's claims by adding institutional-perspective disclaimers that aren't sourced — will be declined.
This project is non-commercial source-available. Both restrictions matter equally:
- Code: PolyForm Noncommercial 1.0.0. Source-available and modifiable for research, education, journalism, public-interest work, and personal use. Commercial use is not permitted under any circumstance.
- Data and corpus: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Free for non-commercial reuse with attribution and share-alike. Cite as: Whose University?, whoseuniversity.org, accessed YYYY-MM-DD.
These are public records of the Indian state, processed and published as a public-interest research artefact. The licences ensure they stay that way — no commercial product can be built on top of this work, by anyone.
A CITATION.cff at the repository root carries machine-readable
citation metadata; GitHub renders it as a "Cite this repository"
button.
Dated project changes are recorded in CHANGELOG.md.