Skip to content

Fix Chronos-2 fine-tuning to preserve loaded GPU device index#471

Open
dario-fumarola wants to merge 1 commit intoamazon-science:mainfrom
dario-fumarola:fix/issue-457-respect-gpu-index
Open

Fix Chronos-2 fine-tuning to preserve loaded GPU device index#471
dario-fumarola wants to merge 1 commit intoamazon-science:mainfrom
dario-fumarola:fix/issue-457-respect-gpu-index

Conversation

@dario-fumarola
Copy link

Summary

  • override Chronos2Trainer._move_model_to_device to preserve the model's existing CUDA device when the model is already loaded on a specific GPU and no hf_device_map is set
  • this prevents transformers.Trainer from moving a model loaded on e.g. cuda:5 back to cuda:0
  • add dedicated unit tests for the CUDA-preservation behavior and guard cases

Why

Issue #457 reports that Chronos-2 fine-tuning fails when the model is loaded on a non-zero GPU index because the default Trainer move logic forces the model onto cuda:0.

Validation

  • uv run --python 3.11 python -m pytest test/test_chronos2_trainer.py
  • uv run --python 3.11 python -m pytest test/test_chronos2.py -k "pipeline_can_be_finetuned or two_step_finetuning_with_df_input_works"
  • uv run --python 3.11 mypy src test

Fixes #457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chronos-2 fine-tuning does not respect the GPU index

1 participant