Skip to content

TrelisResearch/ga

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ga — Irish (Gaeilge) ASR & TTS

Training Irish-language ASR and TTS models.

Structure

reports/
  phase1-report.md        # Phase 1: baseline evals, test set construction, Oireachtas pipeline
  phase2-roadmap.md       # Phase 2: test set inspection, training data pilot
data/
  README.md               # Dataset inventory — sources, stats, status, usage notes
archive/
  phase1/                 # All Phase 1 scripts (pipeline, eval, upload, dataset builders)

Current Phase

Phase 2 — see reports/phase2-roadmap.md.

Steps:

  1. Inspect test sets (CV, Living Audio, Doegen) — listen to samples, decide on quality
  2. Download + align one Oireachtas debate (<3h) and manually inspect alignment quality

Key Datasets

See data/README.md for full details.

Dataset HF Repo Status
Test (CV) ronanarraig/irish-asr-test-cv Rebuild needed — used train-contaminated pool
Test (Living Audio) ronanarraig/irish-asr-test-living Done — inspect quality
Test (Doegen) ronanarraig/irish-asr-test-doegen Upload in progress — inspect after
Test (FLEURS) ronanarraig/irish-asr-test Rebuild needed — used dev+test, should be test-only
Training ronanarraig/irish-asr-train Invalid — Phase 1 bug, needs redo

About

Training Irish (Gaeilge) ASR and TTS models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors