Training Irish-language ASR and TTS models.
reports/
phase1-report.md # Phase 1: baseline evals, test set construction, Oireachtas pipeline
phase2-roadmap.md # Phase 2: test set inspection, training data pilot
data/
README.md # Dataset inventory — sources, stats, status, usage notes
archive/
phase1/ # All Phase 1 scripts (pipeline, eval, upload, dataset builders)
Phase 2 — see reports/phase2-roadmap.md.
Steps:
- Inspect test sets (CV, Living Audio, Doegen) — listen to samples, decide on quality
- Download + align one Oireachtas debate (<3h) and manually inspect alignment quality
See data/README.md for full details.
| Dataset | HF Repo | Status |
|---|---|---|
| Test (CV) | ronanarraig/irish-asr-test-cv |
Rebuild needed — used train-contaminated pool |
| Test (Living Audio) | ronanarraig/irish-asr-test-living |
Done — inspect quality |
| Test (Doegen) | ronanarraig/irish-asr-test-doegen |
Upload in progress — inspect after |
| Test (FLEURS) | ronanarraig/irish-asr-test |
Rebuild needed — used dev+test, should be test-only |
| Training | ronanarraig/irish-asr-train |
Invalid — Phase 1 bug, needs redo |