averaging in the Fisher matrix construction

Hi, I like your blog post and code, it helped me get started with some things on this subject in JAX, so thanks!

I have a comment about the fisher_vp function. It seems to me that two possible two-index tensors one could construct are

1. $F(w) =E_{X,Y} [\nabla L(X,Y,w) \nabla^T L(X,Y,w)]$
2. $F(w) = \nabla E_{X,Y} [L(X,Y,w)]   \nabla^T E_{X,Y}[ L(X,Y,w) ]$

and in the way fisher_vp is used in the empirical or true Fisher step

https://github.com/gebob19/naturalgradient/blob/d7d0a123a5acc4162baec99d6b919fe2c7bcdc00/natural_grad.py#L148-L151
https://github.com/gebob19/naturalgradient/blob/d7d0a123a5acc4162baec99d6b919fe2c7bcdc00/natural_grad.py#L222

the second of these is being used: the derivative of the loss function is averaged before being passed to fisher_vp. However the Fisher matrix requires the averaging to occur over the two-index tensors.



	loss, grads = jax.value_and_grad(mean_cross_entropy)(params, batch)
	f = lambda w: mean_cross_entropy(w, batch)
	fvp = lambda v: fisher_vp(f, params, v)
	ngrad, _ = jax.scipy.sparse.linalg.cg(fvp, grads, maxiter=10) # approx solve

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

averaging in the Fisher matrix construction #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

averaging in the Fisher matrix construction #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions