Input scaling, optimiser choise and other changes#3
Input scaling, optimiser choise and other changes#3dquigley533 wants to merge 1 commit intoTheodoreWolf:mainfrom
Conversation
…parameters needed. Switched to LBFGS since examples use whole batch training. Adjusted training to that physics_loss dominates. Used smooth activation functions.
|
Hi @dquigley533 and thanks for the PR! Sorry for not getting back to you sooner. My suggestion is to instead of modifying the existing code, to subclass your improvements as a new class and then simply adding the improved versions as an additional section in the notebook. This lets people who come from medium not be confused by different code while still seeing that what I've done is clearly not optimal. |
I came across your Medium post on this - thanks for making the code available. I'm currently learning about PINNs myself and found your tutorial to be a more useful introduction than many things I've read.
In my further reading and experimentation I made some tweaks which you might consider improvements, hence the PR in case you'd like to incorporate them.
Input/Output scaling. The input time is scaled into [0,1], as is the output. This dramatically reduces the number of parameters needed to get a good reproduction of the PDE solution. I'm using a few dozen nodes in a single hidden layer.
Switch optimiser. Your example was training on the whole data, rather than a batch, so using AdamW was unnecessary since you have the exact (rather than estimated) gradients. Using LBFGS, the training converges in a handful of steps, noting that I switched activation functions from ReLU to GELU to get smooth gradients.
With those changes the problem fixed by the L2 regularisation doesn't seem to occur, but I've left that in anyway.
In the examples which use
physics_lossI've weighted this massively higher than the MSE loss on the training points to avoid overfitting to the noise on those points.The whole thing now runs in a handful of seconds on a CPU, which did wonders for my confidence that PINNs would be tractable for problems in higher numbers of dimensions.
Obviously feel free to ignore - just felt compelled to share what I learned by trying to push your example to its limits.