Hey authors. Thank you for this insanely insightful paper. I read through the entirety of your paper and had some questions which I hoped you could explain or address in simple words:
-
How does the gradient based scoring function have O(M) complexity compared to SGP's O(3M)?
-
For the bootstrap correction, are the additional L steps done after the global weights of the meta-model (outer loop) is updated from the K inner loop steps? Or is the additional L steps done within the inner loop after the K steps?
-
To adapt to the K new steps, backpropagation is still being done obviously. How exactly is this saving memory costs? By introducing L more computations, does that not affect the overall computation time?
Thank you for your time!
Hey authors. Thank you for this insanely insightful paper. I read through the entirety of your paper and had some questions which I hoped you could explain or address in simple words:
How does the gradient based scoring function have O(M) complexity compared to SGP's O(3M)?
For the bootstrap correction, are the additional L steps done after the global weights of the meta-model (outer loop) is updated from the K inner loop steps? Or is the additional L steps done within the inner loop after the K steps?
To adapt to the K new steps, backpropagation is still being done obviously. How exactly is this saving memory costs? By introducing L more computations, does that not affect the overall computation time?
Thank you for your time!