Hi, I would like to implement the diffusion pretraining described in the paper. I understand that in ORB diffusion pretraining:
- Noise is added as $x_\sigma = x_0 + \sigma * \epsilon $
- The model predicts a vector which is interpreted as $\epsilon$. So we train the model to match/learn the noise added to x, not to recover original position or other strategy.
-
$\sigma$ is sampled log-uniformly from a range.
- The loss is an MSE on $\epsilon$, weighted by $1 / \sigma^2 $
I’m a bit unsure about the following points
-
$\sigma$ is not passed as an explicit input (e.g. sinusoidal embedding) to the network, and the model is trained unconditionally across $\sigma$. Is this correct? I.e. one single network, same parameters, trained on a mixture of noise levels.
- Is the $1 / \sigma^2 $ factor the only sigma-dependent normalization?
- Is $\sigma$ sampled per batch or per structure?
- In the paper you say $\sigma$ increases over time but you don't say what range/values were used. :)
Any clarification would be very helpful.
Hi, I would like to implement the diffusion pretraining described in the paper. I understand that in ORB diffusion pretraining:
I’m a bit unsure about the following points
Any clarification would be very helpful.