Train with weight decay and momentum

I'm using SLS to train my own model, but I found it's different to train with plain SGD or SGD+wd+mom. 
When I use plain SGD, the step size increase at first, following exponential trend, which is consistent with you published work.
However, if I use SGD+weight decay+momentum, the step size is very stable (0.02~0.03) for most of the time. 
Can you explain why? Is SPS incompatible with optimizer momentum and weight decay?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train with weight decay and momentum #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Train with weight decay and momentum #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions