Awesome Optimizers List

Curated optimizer-design papers from 2022+, ordered by date in reverse chronological order.

CSV: data/optimizers.csv

Date	Optimizer Name	Advantage
2604	Newton-Muon	Adds input-side Newton preconditioning to Muon and reduces training steps and wall-clock time in reported GPT-2 pretraining runs.
2604	CLion	Applies cautious updates to Lion to improve generalization while preserving lightweight optimizer state and efficiency.
2604	Adam-HNAG	Reformulates Adam with a curvature-aware correction and provides accelerated convergence guarantees in the reported setting.
2602	TSR-Adam	Uses two-sided low-rank synchronization to reduce Adam-family communication cost in distributed training.
2602	NAMO	Combines Muon-style orthogonalized momentum with Adam-type noise adaptation to improve stability at negligible extra cost.
2601	Spectral Sphere Optimizer (SSO)	Constrains both weights and updates on a spectral sphere to improve LLM training stability and outperform AdamW and Muon in reported experiments.
2510	NorMuon	Combines Muon with neuron-wise normalization and second-order statistics to improve scalability and efficiency.
2510	Hill-ADAM	Alternates minimization and maximization phases to help Adam escape local minima in non-convex loss landscapes.
2510	DP-Adam-AC	Adds adaptive clipping to Adam-based private fine-tuning to improve the privacy-utility trade-off for localizable language models.
2509	Conda	Blends Adam-style adaptivity with column-normalized updates to improve optimization efficiency in LLM training.
2506	SPlus	Uses stable whitening-style preconditioning to cut gradient steps and wall-clock time in reported neural network training runs.
2505	PolarGrad	Unifies matrix-gradient preconditioning and introduces polar-decomposition updates that outperform Adam and Muon in reported studies.
2505	Gluon	Generalizes Muon and Scion within an LMO framework and improves layer-wise large-model optimization in the reported setting.
2504	Dion	Distributed orthonormalized updates that reduce large-scale training overhead while preserving Muon-style gains.
2502	Scion	Uses norm-constrained LMO updates that improve stability, memory efficiency, and hyperparameter transfer.
2502	D-Muon	Scales Muon-style orthogonalized updates to distributed LLM training and improves compute efficiency over strong AdamW baselines.
2412	Muon	Uses orthogonalized matrix updates for hidden-layer weights and is typically paired with AdamW for non-matrix parameters.
2411	MARS	Injects variance reduction into adaptive and sign-based optimizers and reports strong GPT-2 training gains.
2411	Cautious Optimizers	Adds a one-line cautious mask to momentum optimizers such as AdamW and Lion.
2411	ADOPT	A modified Adam-family method with stronger convergence guarantees and improved practical stability.
2409	SOAP	Blends Shampoo-style preconditioning with Adam-style moment updates.
2409	AdEMAMix	Mixes older-gradient EMAs into AdamW to improve token efficiency.
2406	Adam-mini	Uses fewer learning-rate groups to reduce optimizer memory with AdamW-like quality.
2405	SF-AdamW (Schedule-Free)	Removes explicit learning-rate schedules and simplifies tuning while keeping AdamW-style behavior.
2405	MicroAdam	Compresses optimizer-state updates to reduce memory overhead while preserving convergence quality.
2405	FAdam	Uses diagonal empirical Fisher preconditioning to make Adam behave more like a lightweight natural-gradient optimizer.
2312	AGD	Auto-switches preconditioning based on stepwise gradient differences to balance adaptivity and efficiency.
2310	AdaLOMO	Low-memory optimizer with adaptive learning rates for resource-constrained full-parameter LLM fine-tuning.
2309	AdaPlus	Adds Nesterov momentum and more precise stepsize control on top of AdamW-style updates.
2307	CoRe	All-in-one optimizer designed to work robustly across tasks with less retuning.
2307	CAME	Confidence-guided memory-efficient optimization for large-scale model training.
2307	Adam+CM	Adds critical momenta to Adam-style updates to improve exploration and escape poor minima.
2306	Prodigy	Parameter-free learner derived from D-Adaptation that reduces learning-rate tuning.
2306	LOMO	Fuses gradient computation and parameter updates to enable low-memory full-parameter LLM fine-tuning.
2305	WSAM	Revisits SAM with weighted sharpness to improve generalization while keeping optimization practical.
2305	UAdam	Unified Adam-type framework that studies convergence behavior across a broad class of Adam-family methods.
2305	Sophia	Scalable stochastic second-order optimizer for language-model pretraining.
2305	DoWG	Universal parameter-free gradient method that extends DoG-style step-size adaptation with stronger empirical performance.
2302	Lion	Sign-based momentum optimizer discovered by symbolic search.
2302	FOSI	Combines first-order optimizers with second-order curvature information for faster convergence on difficult objectives.
2302	DoG	Parameter-free dynamic step-size schedule that makes SGD-style optimization much less tuning-sensitive.
2301	D-Adaptation	Learning-rate-free optimization for SGD, Adam, and AdaGrad variants.
2211	VeLO	Learned optimizer trained at scale to transfer across tasks and architectures better than smaller learned optimizers.
2210	Amos	Adam-style optimizer with adaptive decay and scale-aware weight decay.
2210	AdaNorm	Corrects gradient-norm scaling to stabilize adaptive optimization for CNNs.
2208	Adan	Adaptive Nesterov momentum optimizer for faster and more stable deep-model training.
2206	GradaGrad	Non-monotone adaptive stochastic gradient method aimed at improving practical convergence over monotone variants.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Optimizers List

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Optimizers List

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages