-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
Thank you for sharing this great repository!
I'm currently trying to run fine-tuning using a relatively large dataset (~60k records). However, I'm encountering an out-of-memory (OOM) error with PyTorch.
Could you please advise on how to modify the fine-tuning code to run in smaller batches without conflicting with other parts of the codebase?
Here is the error message I'm getting:
torch.Size([2, 2048])
torch.Size([2, 2048])
FCNNetwork build torch.Size([2, 1])
0.001
Inner Loop parameters
names_learning_rates_dict.layer_dict-linear-weights torch.Size([6]) True
Outer Loop parameters
regressor.layer_dict.linear0.linear.weights torch.Size([2048, 2048]) True
regressor.layer_dict.linear0.linear.bias torch.Size([2048]) True
regressor.layer_dict.linear0.norm_layer.bias torch.Size([5, 2048]) True
regressor.layer_dict.linear0.norm_layer.weight torch.Size([5, 2048]) True
regressor.layer_dict.linear1.linear.weights torch.Size([2048, 2048]) True
regressor.layer_dict.linear1.linear.bias torch.Size([2048]) True
regressor.layer_dict.linear1.norm_layer.bias torch.Size([5, 2048]) True
regressor.layer_dict.linear1.norm_layer.weight torch.Size([5, 2048]) True
regressor.layer_dict.linear.weights torch.Size([1, 2048]) True
inner_loop_optimizer.names_learning_rates_dict.layer_dict-linear-weights torch.Size([6]) True
run_actfound_V1.py:30: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
x_task = torch.tensor(xs)
Traceback (most recent call last):
File "run_actfound_V1.py", line 249, in
main()
File "run_actfound_V1.py", line 246, in main
run_prediction(args, model)
File "run_actfound_V1.py", line 169, in run_prediction
y_pred, _ = model.run_predict(x_task, y_task_input, split)
File "Actfound_demo/system_actfound_original.py", line 23, in run_predict
names_weights_copy, support_loss_each_step, _ = self.inner_loop(x_task, y_task, 0, split,False, -1, num_steps)
File "Actfound_demo/system_base.py", line 152, in inner_loop
support_loss, support_preds = self.net_forward(x=x_task,
File "Actfound_demo/system_actfound_original.py", line 68, in net_forward
ddg_pred = support_value.unsqueeze(-1) - support_value.unsqueeze(0)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.40 GiB. GPU
########
Thanks