Now gradients are overwritten for each call, need to accumulate instead and provide an API to zero them.
Now gradients are overwritten for each call, need to accumulate instead and provide an API to zero them.