Hello refiners,
I'm experimenting with trainer and especially I'm facing a problem to load/save models weights
The sequence of the trainer is :
trainer.prepare_models is loading the checkpoint on a non-injected model
on_train_begin is injecting the dropout_adapter
on_checkpoint_save is saving the checkpoint (using model.state_dict())
The named of the Dropout-impacted layers are changed in step 2.
As a result, the model saved in on_checkpoint_save are not compatible with the loading in trainer.prepare_models, and i cannot smootly save/load the model.
Toy example
The injection of the dropout adpater is changing the keys of weights in state_dict()
from refiners.fluxion.layers.chain import Chain
from refiners.fluxion.layers.linear import Linear
from refiners.training_utils.dropout import DropoutAdapter
network = Chain(
Linear(2, 3)
)
keys = network.state_dict().keys()
print(keys)
probability=0.5
for linear, parent in network.walk(Linear):
DropoutAdapter(target=linear, probability=probability).inject(parent)
keys2 = network.state_dict().keys()
print(keys2)
is outputing
odict_keys(['Linear.weight', 'Linear.bias'])
odict_keys(['DropoutAdapter.Linear.weight', 'DropoutAdapter.Linear.bias'])
What i'm not clear is what is the target behavior
A. should .inject(parent) change the name of the weights and we should fix the save/load sequence in the trainer ?
B. should .inject(parent) not change the name of the weights in state_dict() when the adapter is not injecting new weights ?
I can help on this if needed
Hello refiners,
I'm experimenting with trainer and especially I'm facing a problem to load/save models weights
The sequence of the trainer is :
trainer.prepare_modelsis loading the checkpoint on a non-injected modelon_train_beginis injecting the dropout_adapteron_checkpoint_saveis saving the checkpoint (usingmodel.state_dict())The named of the Dropout-impacted layers are changed in step 2.
As a result, the model saved in
on_checkpoint_saveare not compatible with the loading intrainer.prepare_models, and i cannot smootly save/load the model.Toy example
The injection of the dropout adpater is changing the keys of weights in
state_dict()is outputing
What i'm not clear is what is the target behavior
A. should
.inject(parent)change the name of the weights and we should fix the save/load sequence in the trainer ?B. should
.inject(parent)not change the name of the weights instate_dict()when the adapter is not injecting new weights ?I can help on this if needed