Skip to content
This repository was archived by the owner on Sep 26, 2025. It is now read-only.
This repository was archived by the owner on Sep 26, 2025. It is now read-only.

DropoutAdapter injection is impacting state_dict().keys and the ability to load/save checkpoints smoothly #226

@piercus

Description

@piercus

Hello refiners,

I'm experimenting with trainer and especially I'm facing a problem to load/save models weights

The sequence of the trainer is :

  1. trainer.prepare_models is loading the checkpoint on a non-injected model
  2. on_train_begin is injecting the dropout_adapter
  3. on_checkpoint_save is saving the checkpoint (using model.state_dict())

The named of the Dropout-impacted layers are changed in step 2.

As a result, the model saved in on_checkpoint_save are not compatible with the loading in trainer.prepare_models, and i cannot smootly save/load the model.

Toy example

The injection of the dropout adpater is changing the keys of weights in state_dict()

from refiners.fluxion.layers.chain import Chain
from refiners.fluxion.layers.linear import Linear
from refiners.training_utils.dropout import DropoutAdapter

network = Chain(
    Linear(2, 3)
)

keys = network.state_dict().keys()
print(keys)

probability=0.5

for linear, parent in network.walk(Linear):
    DropoutAdapter(target=linear, probability=probability).inject(parent)

keys2 = network.state_dict().keys()
print(keys2)

is outputing

odict_keys(['Linear.weight', 'Linear.bias'])
odict_keys(['DropoutAdapter.Linear.weight', 'DropoutAdapter.Linear.bias'])

What i'm not clear is what is the target behavior
A. should .inject(parent) change the name of the weights and we should fix the save/load sequence in the trainer ?
B. should .inject(parent) not change the name of the weights in state_dict() when the adapter is not injecting new weights ?

I can help on this if needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions