Skip to content

【Hackathon 10th Spring No.9】NewtonNet 复现#262

Open
co63oc wants to merge 4 commits intoPaddlePaddle:developfrom
co63oc:fix1
Open

【Hackathon 10th Spring No.9】NewtonNet 复现#262
co63oc wants to merge 4 commits intoPaddlePaddle:developfrom
co63oc:fix1

Conversation

@co63oc
Copy link
Copy Markdown

@co63oc co63oc commented Apr 8, 2026

PaddlePaddle/Paddle#77429
NewtonNet 复现

torch版本
https://github.com/co63oc/NewtonNet/tree/fix1 fix1分支
修改主要为配置修改,输出日志信息,保存权重,转换权重为paddle
配置修改训练批次大小50
转换权重脚本 https://github.com/co63oc/NewtonNet/blob/fix1/scripts/convert_paddle.py
运行脚本

cd scripts
python  newtonnet_train.py  -c config.yml

运行输出日志
https://github.com/co63oc/NewtonNet/blob/fix1/scripts/torch_newtonnet.log

当前PR 为paddle版本,需要安装paddle_geometric,paddle_scatter
配置文件 interatomic_potentials/configs/newtonnet/newtonnet.yaml
权重 interatomic_potentials/configs/newtonnet/newtonnet.pdparams,只是用来加载和torch一致的初始化参数,不是训练的结果权重
训练时需要使用数据集计算 stats_calc 更新权重,增加pretrained_need_update_by_train_loader配置,加载数据集后更新权重
增加 clip_grad 配置项,前向计算时需要修改模型参数
模型增加精度配置 precision,用来修改模型精度,修改模型精度时需要修改数据集精度,删除已有数据集 interatomic_potentials/example_data/aspirin/ccsd_train/processed/data.pt 运行后会重新生成
运行脚本

cd interatomic_potentials
python train.py -c configs/newtonnet/newtonnet.yaml

运行日志 https://github.com/co63oc/NewtonNet/blob/fix1/scripts/paddle_newtonnet.log

测试加载相同初始化参数,使用相同数据集按顺序读取训练,训练两轮以上loss一致,因为使用较多scatter计算,训练轮次逐渐增加时会增加浮点误差

torch

batch15 tensor(40.9183, device='cuda:0', dtype=torch.float64, grad_fn=<AddBackward0>)
batch16 tensor(34.6856, device='cuda:0', dtype=torch.float64, grad_fn=<AddBackward0>)
batch17 tensor(44.9074, device='cuda:0', dtype=torch.float64, grad_fn=<AddBackward0>)
batch18 tensor(52.8147, device='cuda:0', dtype=torch.float64, grad_fn=<AddBackward0>)
batch19 tensor(38.0302, device='cuda:0', dtype=torch.float64, grad_fn=<AddBackward0>) 

paddle

[2026/04/08 10:07:30] ppmat INFO: Train: Epoch [2/5] | Step: [15/19] | lr: 0.001000 | reader_cost: 0.000135 | batch_cost: 4.812305 | loss(loss): 40.918344�[0m
[2026/04/08 10:07:37] ppmat INFO: Train: Epoch [2/5] | Step: [16/19] | lr: 0.001000 | reader_cost: 0.000164 | batch_cost: 7.384202 | loss(loss): 34.685635�[0m
[2026/04/08 10:07:45] ppmat INFO: Train: Epoch [2/5] | Step: [17/19] | lr: 0.001000 | reader_cost: 0.000152 | batch_cost: 7.996556 | loss(loss): 44.907362�[0m
[2026/04/08 10:07:50] ppmat INFO: Train: Epoch [2/5] | Step: [18/19] | lr: 0.001000 | reader_cost: 0.000135 | batch_cost: 5.235708 | loss(loss): 52.814710�[0m
[2026/04/08 10:07:58] ppmat INFO: Train: Epoch [2/5] | Step: [19/19] | lr: 0.001000 | reader_cost: 0.000135 | batch_cost: 7.812943 | loss(loss): 38.030187�[0m

增加 paddle_save_fix,修改 paddle.save 调用会有权重丢失问题,复制权重后保存

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 8, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants