Simultaneous Machine Translation (SiMT) aims to generate translations simultaneously with the reading of the source sentence, balancing translation quality with latency. Most SiMT models currently require training multiple models for different latency levels, thus increasing computational costs and, more importantly, limiting flexibility. The new approach is, like Mixture- of-Experts Wait-k policy, training multiple wait-k values in balance between the considerations of both latency and translation quality, leaving the determination of the optimal value of k for unseen data as an open challenge. Moreover, variability in the structure of structure between different languages makes the problem even more complicated because the application of a fixed policy becomes rather ineffective.
- Base Model: The project will utilize the Mixture-of-Experts Wait-k policy as the backbone model. This policy allows each head of the multi-head attention mechanism to perform translation with different levels of latency.
This repo contains (high level):
MLProject_pytorch+SCST+SiMT.ipynb— primary notebook with model code, experiments and walkthroughs.HMT-SiLLM_2.py— Python script(s) related to HMT / SiLLM experiments.requirement.txt— Python package dependencies.test_dataset.json— example dataset format (input → target pairs).*.pdf— project writeups and mathematical notes.
- Workflow
- Key contributions
- Repository structure (high level)
- Requirements
- Quick start
- Typical workflow
- Example commands
- Large models & LoRA notes
- Evaluation & metrics
- Reproducibility tips
- Contributing
- References
- Data Format: JSON files with input-output sentence pairs.
- Tokenization: Utilized AutoTokenizer from Hugging Face for sentence processing.
- Padding and Alignment: Padded source sentences and shifted decoder input for alignment.
- Utilized a flexible wait-k strategy to dynamically adjust latency based on remaining input length.
- Enhanced with HMT to predict sequence likelihoods, improving token generation decisions.
-
Wait-K Policy Formula:
$$g(t; k) = \min(k + t - 1, |Z|)$$
- Reward Function: Optimized using BLEU and ROUGE metrics.
- Policy Optimization: RL agent trained via policy gradients.
- Advantage Calculation: Based on the difference between sampled and baseline rewards.
-
SCST Reward Formula:
$$R(\theta) = \sum_{t=1}^T (r_t - b_t) \log P(y_t | x; \theta)$$
- Base Model: LLaMA-7B fine-tuned with LoRA, supported by HMT for improved sequence prediction.
- Optimization: Adam optimizer with cross-entropy loss.
- Device Compatibility: Supports GPU (CUDA), MPS (Apple Silicon), and CPU.
- BLEU Score: Measures translation quality using n-gram overlaps.
- ROUGE-L Score: Assesses informativeness and coverage.
- Latency: Quantified via read-write sequence length ratio.
-
Latency Metric (AL):
$$AL = \frac{1}{\tau} \sum_{t=1}^\tau \left[g(t) - t - 1\right] \cdot \frac{|y|}{|x|}$$
- Dynamic Wait-k Policy significantly improved latency-quality trade-offs.
- SCST Fine-Tuning optimized performance through reinforcement learning.
- HMT Integration enhanced real-time adaptability.
- LoRA-enhanced LLaMA model ensured resource-efficient translations.
- BLEU and ROUGE scores provided robust evaluation metrics.
- Adaptive Wait‑k implementation (dynamic wait decisions based on state/features).
- SCST (reinforcement learning) fine‑tuning to directly optimise quality‑latency reward.
- Integration of HMT building blocks and LoRA adapters for parameter‑efficient fine‑tuning of large models.
- End‑to‑end notebooks and scripts for training, evaluation and analysis.
- Python 3.8+
- GPU recommended (NVIDIA CUDA) for model fine‑tuning.
- Key Python libraries (suggested):
torch,transformers,datasets,sacrebleu,sentencepiece(if using byte‑pair tokenizers),accelerate(optional),einops,numpy,tqdm.
Install example:
python -m pip install -r requirement.txtIf you use LoRA code from peft or loralib, install those packages as well (see requirement.txt).
- Clone the repository and enter the folder:
git clone https://github.com/Tanmay-IITDSAI/MLProject.git
cd MLProject- Create & activate a virtual environment (optional but recommended):
python -m venv venv
# macOS / Linux
source venv/bin/activate
# Windows (PowerShell)
.\venv\Scripts\Activate.ps1
pip install -r requirement.txt- Inspect the example dataset:
less test_dataset.json
# or open in your editor / notebook- Open and run the main notebook:
jupyter notebook MLProject_pytorch+SCST+SiMT.ipynbThe notebook walks through data loading, scalar/sequence metrics, model training (baseline), SCST fine‑tuning, and evaluation.
- Inspect / preprocess dataset; convert to the expected JSON format (list of
{"src":..., "tgt":...}pairs). - Train a baseline simultaneous model or configure a pre‑trained model for online decoding.
- Run adaptive Wait‑k policy training (supervised / imitation learning stage).
- Apply SCST for reward‑based fine‑tuning to trade off BLEU vs latency.
- Evaluate with BLEU/ROUGE and latency metrics (Average Lagging, Consecutive Waits, etc.).
- Optionally apply LoRA adapters and re‑run experiments with large models.
These commands are illustrative — check each script's --help for exact flags.
# Run a training script (small-scale demo)
python HMT-SiLLM_2.py --data test_dataset.json --epochs 5 --batch_size 16 --lr 1e-4 --save_dir checkpoints/demo
# Evaluate a saved checkpoint
python evaluate.py --model checkpoints/demo/best.pt --test_data test_dataset.json --metrics bleu,avg_lagging
# Run the notebook non-interactively (NBConvert) to execute cells
jupyter nbconvert --to notebook --execute MLProject_pytorch+SCST+SiMT.ipynb --output executed.ipynb- The repo references experiments with large models (e.g., LLaMA family). Model weights are not included — obtain them separately and ensure you follow licensing requirements.
- LoRA (Low‑Rank Adaptation) adapters are used to limit the number of trainable parameters. This is helpful when fine‑tuning large models on limited hardware.
- For LoRA training: use mixed precision (AMP), gradient accumulation and multi‑GPU if available.
- Quality: BLEU, SacreBLEU, ROUGE (where applicable), and human/LLM judgments.
- Latency: Average Lagging (AL), Average Proportion (AP), and other simultaneous translation metrics.
- Reward design: SCST optimises a composite reward (e.g., BLEU − λ × latency). The notebook contains examples of reward formulations and hyperparameters.
- Fix random seeds (
numpy,torch,random) and log seed values in run configs. - Pin package versions in
requirement.txt(or provide anenvironment.yml). - Use smaller subsets for debugging and only scale up after pipeline correctness is verified.
- Save model checkpoints, training logs and hyperparameter configs alongside results.
Contributions, issues and PRs are welcome. Suggested improvements:
- Add robust CLI docs and a configuration system (e.g., Hydra / OmegaConf).
- Add unit tests for data preprocessing and evaluation metrics.
- Integrate a lightweight experiment management (Weights & Biases, MLflow) for reproducibility.
- Use HMT Transformer for further analyis with Adaptive Wait K policy (repo hadn't included the requirement file)
-
Zhang, S., & Feng, Y. (2021).
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7306–7317.
Available here -
Gu, J., et al. (2017).
Learning to translate in real-time with neural machine translation.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.
Available here -
Grissom II, A., He, H., Boyd-Graber, J., Morgan, J., & Daumé III, H. (2014).
Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1342–1352.
Available here