Skip to content
#

bert-score

Here are 15 public repositories matching this topic...

End-to-end MLOps pipeline that catches LLM quality regressions before production. Every PR is scored against a versioned golden dataset using BERTScore + ROUGE-L + an LLM-as-Judge rubric, compared to the MLflow production baseline, and shadowed against 5% of live traffic. FastAPI + Celery + TimescaleDB + Streamlit + DVC + GitHub Actions.

  • Updated Jun 15, 2026
  • Python

Comparative evaluation of neural machine translation architectures (MarianMT, mBART-50, NLLB-200, GPT-2) across German, Spanish, and Arabic. Includes multi-metric scoring (BLEU, chrF, METEOR, BERTScore, LaBSE), cross-lingual semantic similarity analysis, LLM-as-judge evaluation via LangChain, and WMT14/OPUS-100 benchmark runs.

  • Updated Jun 19, 2026
  • Python

Improve this page

Add a description, image, and links to the bert-score topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert-score topic, visit your repo's landing page and select "manage topics."

Learn more