Enabling gradient clipping in sft.sft_trainer #2942

ChingTsai · 2026-01-14T02:26:37Z

Description

Enable gradient clipping in Maxtext.sft.sft_trainer optimizer.
- Currently, gradient_clipping_threshold has no effect in Maxtext.sft.sft_trainer
We tested this configuration on our qwen3-4b SFT workload, and it significantly accelerated training convergence.

FIXES: b/475340254

Tests

python3 -m MaxText.sft.sft_trainer \
   src/MaxText/configs/sft.yml \
    run_name=$(date +%Y-%m-%d-%H-%M-%S) \
    base_output_directory=<xxx> \
    model_name=qwen3-4b \
    load_parameters_path=<xxx>/0/items \
    tokenizer_path=Qwen/Qwen3-4B \
    steps=53 \
    profiler=xplane \
    hf_path=arrow \
    dataset_type=hf \
    train_split=train \
    hf_train_files=<xxx>.arrow \
    hf_eval_files=<xxx>.arrow \
    per_device_batch_size=16 \
    max_target_length=1024 \
    learning_rate=5e-6 \
    warmup_steps_fraction=0.05 \
    gradient_clipping_threshold=1 \
    weight_dtype=bfloat16

Enabling this reduced the final training loss from 0.381 to 0.069
- unpatched logs
- patched logs

Checklist

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

enabling gradient clipping in sft.sft_trainer

4d08c26

ChingTsai marked this pull request as ready for review January 14, 2026 02:54

ChingTsai requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners January 14, 2026 02:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling gradient clipping in sft.sft_trainer #2942

Enabling gradient clipping in sft.sft_trainer #2942

Uh oh!

ChingTsai commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enabling gradient clipping in sft.sft_trainer #2942

Are you sure you want to change the base?

Enabling gradient clipping in sft.sft_trainer #2942

Uh oh!

Conversation

ChingTsai commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChingTsai commented Jan 14, 2026 •

edited

Loading