Refactor model loading in distillation to remove Tunix adapter. #2927

gagika · 2026-01-10T21:32:39Z

Description

Refactors the distillation trainer to remove the TunixMaxTextAdapter dependency, enabling direct control over model execution and configuration.

Key Changes:

Direct Model Loading: Replaces the Tunix adapter with standard MaxText nnx.Module loading via get_maxtext_model.
Config-Bound Forward Pass: Introduces create_forward_fn to generate distinct forward functions for Student and Teacher. This correctly binds enable_dropout from the config, allowing the Student to train with dropout while keeping the Teacher deterministic.
Strict Configuration: Removed environment variable fallbacks; teacher_overrides.load_parameters_path is now mandatory to ensure reproducibility.

Tests

Verified locally with the following command:

python3 -m src.MaxText.distillation.train_distill src/MaxText/configs/distillation.yml \
  run_name=distill_llama3 \
  base_output_directory=${BASE_OUTPUT_DIRECTORY} \
  checkpoint_period=2000 \
  hf_access_token=$HF_TOKEN \
  log_period=10 \
  save_checkpoint_on_completion=True \
  teacher_overrides.load_parameters_path="gs://agagik-us/llama3/llama3.1-8b/scanned_chkpt/0/items"

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-10T21:45:31Z

Codecov Report

❌ Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/distillation/train_distill.py	0.00%	12 Missing ⚠️

📢 Thoughts on this report? Let us know!

gagika force-pushed the agagik-distill-2 branch from ba157c4 to 597f866 Compare January 12, 2026 22:04

Refactor model loading in distillation to remove Tunix adapter.

6bd9e12

gagika force-pushed the agagik-distill-2 branch from 597f866 to 6bd9e12 Compare January 12, 2026 22:08

gagika assigned richjames0 and sheng-li Jan 12, 2026

gagika marked this pull request as ready for review January 12, 2026 22:09

gagika requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners January 12, 2026 22:09

gagika assigned Obliviour Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor model loading in distillation to remove Tunix adapter. #2927

Refactor model loading in distillation to remove Tunix adapter. #2927

gagika commented Jan 10, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor model loading in distillation to remove Tunix adapter. #2927

Are you sure you want to change the base?

Refactor model loading in distillation to remove Tunix adapter. #2927

Conversation

gagika commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Jan 10, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gagika commented Jan 10, 2026 •

edited

Loading