Skip to content

Train with weight decay and momentum #4

@milliema

Description

@milliema

I'm using SLS to train my own model, but I found it's different to train with plain SGD or SGD+wd+mom.
When I use plain SGD, the step size increase at first, following exponential trend, which is consistent with you published work.
However, if I use SGD+weight decay+momentum, the step size is very stable (0.02~0.03) for most of the time.
Can you explain why? Is SPS incompatible with optimizer momentum and weight decay?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions