TRIT: Translation-Reasoning Integrated Training

Official implementation of "Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training"

🔥 Overview

Long reasoning models often struggle with multilingual settings: they tend to reason in English for non-English questions, and when forced to reason in the question language, performance drops substantially. TRIT addresses this by integrating translation training directly into multilingual reasoning through a self-improving reinforcement learning framework.

Framework Overview

Key Innovation: TRIT creates a closed feedback loop where:

Translation provides multilingual question data for reasoning
Reasoning accuracy provides quality signals for translation
No external feedback or additional multilingual data required

Two-Stage Process:

Cross-Lingual Reasoning: Filter questions by accuracy threshold to ensure reliable feedback
Translation-Reasoning Integration: Train translation and reasoning jointly, creating mutual improvement

All tasks are optimized using GRPO (Group Relative Policy Optimization).

🎯 Key Results

Main Performance

TRIT achieves:

+7 percentage points average improvement over SLC-RL baseline across three models
+5 percentage points over M-Thinker on Qwen3 models
Near-perfect language consistency (>99%) across all settings

Translation Quality Improvement

TRIT improves translation quality both in-domain and out-of-domain:

In-domain (MATH500): Up to 3.3:1 win-to-loss ratio vs baseline
Out-of-domain (FLORES-200): Up to +8.4 COMET points

Cross-lingual Question Alignment

Translation training induces question-level alignment:

+15.9 percentage points improvement in final-layer similarity (DeepSeek-Distill-1.5B)
Substantially higher alignment across all model layers compared to External-Translation baseline

📊 Additional Analysis

Flexible Reasoning Setting

TRIT remains effective even when reasoning language is not constrained:

52.1% accuracy when models can reason in any language (Qwen3-1.7B)
+4.1pp improvement over SLC-RL in flexible setting
Demonstrates TRIT improves question understanding, not just language consistency

Threshold Sensitivity

Optimal filtering threshold θ = 1/3 balances noise reduction and data retention.

🚀 Getting Started

Installation

git clone https://github.com/NJUNLP/TRIT.git
cd TRIT
pip install -r requirements.txt

Data

Download training data from Hugging Face.

Training

Stage 1: Cold-start Training

We use LlamaFactory for supervised fine-tuning. Configuration: scripts/sft.yaml

llamafactory-cli train scripts/sft.yaml

Stage 2: TRIT Training

We use VeRL for reinforcement learning. Example script: scripts/example.sh

bash scripts/example.sh

💡 Key Insights

Translation-Reasoning Integration is the Core Innovation
- Translation training improves question understanding
- Reasoning feedback guides translation quality
- Joint optimization creates self-improving loop
Question-Level Alignment Matters
- TRIT induces aligned cross-lingual representations
- External translations alone don't achieve this alignment
- Alignment correlates with reasoning improvements
Framework is Flexible and Robust
- Works across models with varying multilingual capabilities
- Effective in both constrained and flexible reasoning settings
- Supports iterative training for continual improvement

🙏 Acknowledgments

This work is supported by National Science Foundation of China (No. 62376116), research project of Nanjing University-China Mobile Joint Institute (NJ20250038), and the Fundamental Research Funds for the Central Universities (No. 2024300507).

We thank the authors of DAPO-MATH, M-Thinker, and GRPO for their open-source contributions.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
docker		docker
docs		docs
eval		eval
examples		examples
figures		figures
mt_recipe		mt_recipe
prompts		prompts
recipe		recipe
scripts		scripts
tests		tests
verl		verl
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRIT: Translation-Reasoning Integrated Training

🔥 Overview

Framework Overview

🎯 Key Results

Main Performance

Translation Quality Improvement

Cross-lingual Question Alignment

📊 Additional Analysis

Flexible Reasoning Setting

Threshold Sensitivity

🚀 Getting Started

Installation

Data

Training

💡 Key Insights

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Languages

License

NJUNLP/TRIT

Folders and files

Latest commit

History

Repository files navigation

TRIT: Translation-Reasoning Integrated Training

🔥 Overview

Framework Overview

🎯 Key Results

Main Performance

Translation Quality Improvement

Cross-lingual Question Alignment

📊 Additional Analysis

Flexible Reasoning Setting

Threshold Sensitivity

🚀 Getting Started

Installation

Data

Training

💡 Key Insights

🙏 Acknowledgments

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages