Skip to content

DavidBarbera/alignment-eval-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forced Alignment Evaluation Toolkit

Computes standard phonetic alignment quality metrics by comparing an automatically aligned Praat TextGrid against a human-annotated reference.

Metrics

Metric Description Key Reference
Boundary Displacement Absolute time difference (ms) between each reference boundary and the nearest hypothesis boundary. Reports mean, median, std, and percentage within 10/20/25/50/100 ms thresholds. McAuliffe et al. (2017)
Intersection over Union (IoU) Temporal overlap between matched segments, computed per phone/word. Gonzalez et al. (2020)
Phone Error Rate (PER) Levenshtein edit distance between phone label sequences, normalised by reference length. Standard ASR evaluation

Project Structure

alignment-eval-project/
├── .vscode/
│   └── launch.json          # VS Code debug configurations
├── data/
│   └── examples/            # Demo TextGrid files (auto-generated)
├── outputs/
│   ├── logs/                # Timestamped log files
│   └── reports/             # Evaluation reports (txt + csv)
├── src/
│   ├── __init__.py          # Package metadata
│   ├── __main__.py          # python -m src entry point
│   ├── main.py              # CLI and evaluation pipeline
│   ├── loader.py            # TextGrid loading utilities
│   ├── metrics.py           # Metric computations
│   ├── reporting.py         # Report formatting and file output
│   ├── log_config.py        # Logging setup
│   └── demo.py              # Demo TextGrid generator
├── tests/                   # (placeholder for unit tests)
├── .gitignore
├── requirements.txt
└── README.md

Installation

# Clone the repository
git clone <repo-url>
cd alignment-eval-project

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Linux/macOS
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Usage

Run the demo

python -m src --demo

This generates synthetic reference and hypothesis TextGrid files in data/examples/, runs the evaluation, prints results to the console, and saves reports to outputs/reports/ and logs to outputs/logs/.

Evaluate your own TextGrids

# Basic evaluation (phone tier)
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier phones

# Word-level evaluation
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier words

# Save report and CSV to outputs/reports/
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier phones --save

# Exclude silence-adjacent boundaries
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier phones --exclude-silence --save

VS Code

Open the project in VS Code and use the debug configurations in .vscode/launch.json:

  • Run Demo — runs with built-in example data
  • Evaluate Phones — evaluates the example phone tier
  • Evaluate Words — evaluates the example word tier
  • Evaluate (Pick Files) — prompts you for file paths and tier name

Programmatic usage

from src.main import evaluate_alignment

results = evaluate_alignment(
    ref_path="data/reference.TextGrid",
    hyp_path="data/hypothesis.TextGrid",
    tier_name="phones",
)

# Access individual metrics
bd = results["boundary_displacement"]
print(f"Median displacement: {bd['median_ms']:.1f} ms")
print(f"Within 25 ms: {bd['pct_within_25ms']:.0f}%")

iou = results["iou"]
print(f"Mean IoU: {iou['mean_iou']:.3f}")

per = results["phone_error_rate"]
print(f"PER: {per['per']:.1%}")

Interpretation Guide

  • Boundary displacement: MFA typically achieves ~12–17 ms median on English benchmarks (Buckeye, TIMIT). Human inter-annotator agreement is ~10–13 ms. If your aligner is within this range, it is performing at near-human level.
  • IoU: 1.0 = perfect overlap. Values above 0.8 are generally good for phone-level alignment.
  • PER: 0.0 = perfect label match. Analogous to Word Error Rate (WER) in ASR evaluation.

References

  • McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. Interspeech 2017, 498–502.
  • Gonzalez, S., Grama, J., & Travis, C. E. (2020). Comparing the performance of forced aligners used in sociophonetic research. Linguistics Vanguard, 6(1).
  • Kelley, M. C., et al. (2023). MAPS: A Mason-Alberta Phonetic Segmenter. Interspeech 2023.
  • Rousso, T., et al. (2024). Evaluating forced alignment tools. Speech Communication.

License

MIT

About

A set of utilities to evaluate forced alignment performance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages