Comparative evaluation of neural machine translation architectures (MarianMT, mBART-50, NLLB-200, GPT-2) across German, Spanish, and Arabic. Includes multi-metric scoring (BLEU, chrF, METEOR, BERTScore, LaBSE), cross-lingual semantic similarity analysis, LLM-as-judge evaluation via LangChain, and WMT14/OPUS-100 benchmark runs.
-
Updated
Jun 19, 2026 - Python