A comprehensive, chronologically ordered collection of loss functions across all subdomains of deep learning and machine learning β with paper links, one-line descriptions, mathematical formulations, and implementation references.
350+ loss functions. 25+ categories. Every subdomain of AI.
If this resource helps your research or engineering work, please consider giving it a β
- π Audio, Music & Speech Generation β WaveNet to Stable Audio, 19 losses
- π¬ Video Generation & Understanding β VGAN to VideoPoet, 20 losses
- β³ Time Series Forecasting β Pinball Loss to TimesFM, 23 losses
- π§ Continual & Lifelong Learning β EWC to EASE, 18 methods
- βοΈ Calibration, Fairness & Bias Mitigation β Brier Score to Group DRO, 18 losses
- π‘οΈ Adversarial Robustness & OOD Detection β FGSM-AT to CIDER, 22 losses
- π Anomaly Detection & Multi-Modal Learning β Deep SVDD to ImageBind, 17 losses
- πΌοΈ Image-to-Image Translation β Total Variation to DoveNet, 16 losses
- π Semi-Supervised Learning β Pseudo-Label to SoftMatch, 12 losses
- π― Optical Flow, Video & Pose β Horn-Schunck to SEA-RAFT, 33 losses
Core Categories (inline)
- Loss Selection Guide
- Key Mathematical Formulations
- Classification
- Regression
- Segmentation
- Object Detection (Bounding Box)
- Generative Models β GANs
- Generative Models β VAEs
- Generative Models β Diffusion & Flow
- Reconstruction & Perceptual
- Image Super-Resolution & Restoration
- Contrastive & Self-Supervised Learning
- Metric Learning & Face Recognition
- NLP & Language Modeling
- LLM Alignment (RLHF / DPO)
- Sequence-to-Sequence & Speech
- Reinforcement Learning
- Knowledge Distillation
- Regularization
- 3D Vision & Point Clouds
- Depth Estimation
- Medical Imaging
- Graph Neural Networks
- Recommendation Systems
- Multi-Task Learning
- Uncertainty Estimation
- Domain Adaptation
Extended Categories (separate files)
- Audio, Music & Speech Generation β 19 losses
- Video Generation & Understanding β 20 losses
- Time Series Forecasting β 23 losses
- Continual & Lifelong Learning β 18 methods
- Calibration, Fairness & Bias Mitigation β 18 losses
- Adversarial Robustness & OOD Detection β 22 losses
- Anomaly Detection & Multi-Modal Learning β 17 losses
- Image-to-Image Translation & Style Transfer β 16 losses
- Semi-Supervised Learning & Self-Training β 12 losses
- Optical Flow, Video Prediction & Pose Estimation β 33 losses
Resources
Not sure which loss to use? Here's a quick decision framework:
| Task | Default Choice | Class Imbalance | Noisy Labels | Need Calibration |
|---|---|---|---|---|
| Binary Classification | BCE | Focal Loss | SCE / GCE | Focal + Temp. Scaling |
| Multi-class Classification | Cross-Entropy | Class-Balanced CE | Label Smoothing | Label Smoothing |
| Semantic Segmentation | CE + Dice | Focal Tversky | β | β |
| Object Detection (box) | Smooth L1 + Focal | Focal Loss | β | β |
| Object Detection (IoU) | CIoU / GIoU | β | β | β |
| Image Generation (GAN) | Hinge / Non-Saturating | β | β | β |
| Image Generation (Diffusion) | DDPM (Ξ΅-prediction) | β | β | β |
| Super-Resolution | L1 + Perceptual + GAN | β | β | β |
| Self-Supervised (vision) | InfoNCE / DINO | β | β | β |
| Face Recognition | ArcFace / AdaFace | Sub-center ArcFace | ElasticFace | β |
| Language Modeling | Cross-Entropy (NTP) | β | β | β |
| LLM Alignment | DPO / SimPO | β | β | β |
| Speech Recognition | CTC / RNN-T | β | β | β |
| RL (value-based) | DQN / Double DQN | β | β | β |
| RL (policy-based) | PPO | β | β | β |
| Regression | MSE / Huber | β | Huber | NLL w/ variance |
| Metric Learning | Triplet / Proxy Anchor | β | β | β |
| Medical Segmentation | Dice + Boundary | Tversky / Focal Tversky | β | β |
| 3D Reconstruction | Chamfer + Normal | β | β | β |
| Depth Estimation | Scale-Invariant | β | β | β |
| Time Series | MSE / Quantile | β | Huber | CRPS |
| Continual Learning | EWC / DER++ | β | β | β |
| Fairness | Group DRO | β | β | β |
Cross-Entropy Loss
Binary Cross-Entropy
Focal Loss
Dice Loss
Triplet Loss
InfoNCE / Contrastive Loss
KL Divergence
DDPM Loss (simplified)
DPO Loss
IoU Loss
ArcFace Loss
Wasserstein Distance (WGAN)
0/1 Loss (1950) β The theoretical misclassification indicator; 1 if prediction β label, 0 otherwise. Non-differentiable, foundational to learning theory. π Statistical Decision Functions β Wald, A.
Cross-Entropy Loss / Log Loss / Negative Log-Likelihood (1948) β Measures divergence between predicted probability distribution and true labels; the default loss for multi-class classification.
π A Mathematical Theory of Communication β Shannon, C.E.
π» torch.nn.CrossEntropyLoss
Binary Cross-Entropy (1958) β Cross-entropy specialized for two-class or multi-label problems; operates on each output independently.
π Derived from logistic regression β Cox, D.R. (1958)
π» torch.nn.BCEWithLogitsLoss
Hinge Loss / SVM Loss (1995) β Maximizes the margin between classes; the core loss behind Support Vector Machines.
π Support-Vector Networks β Cortes, C. & Vapnik, V.
π» torch.nn.MultiMarginLoss
Knowledge Distillation Loss / Soft Cross-Entropy (2015) β Trains a student network to mimic a teacher by matching softened output distributions.
π Distilling the Knowledge in a Neural Network β Hinton, G., Vinyals, O. & Dean, J.
π» torch.nn.KLDivLoss
Large-Margin Softmax Loss (L-Softmax) (2016) β Introduces angular margin constraints into softmax for intra-class compactness and inter-class separability. π Large-Margin Softmax Loss for Convolutional Neural Networks β Liu, W., Wen, Y., Yu, Z. & Yang, M. π» wy1iu/LargeMargin_Softmax_Loss
Center Loss (2016) β Penalizes distance of features from learned class centers, improving discriminative feature learning. π A Discriminative Feature Learning Approach for Deep Face Recognition β Wen, Y., Zhang, K., Li, Z. & Qiao, Y. π» KaiyangZhou/pytorch-center-loss
Label Smoothing (2016) β Replaces hard one-hot targets with soft targets, preventing overconfident predictions and improving generalization.
π Rethinking the Inception Architecture for Computer Vision β Szegedy, C. et al.
π» torch.nn.CrossEntropyLoss(label_smoothing=...)
Sparsemax Loss (2016) β Sparse alternative to softmax that assigns exactly zero probability to irrelevant classes. π From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification β Martins, A.F.T. & Astudillo, R.F. π» deep-spin/entmax
Focal Loss (2017) β Down-weights well-classified examples to focus training on hard negatives; designed for extreme class imbalance. π Focal Loss for Dense Object Detection β Lin, T.-Y., Goyal, P., Girshick, R., He, K. & DollΓ‘r, P. π» AdeelH/pytorch-multi-class-focal-loss
Generalized Cross-Entropy (GCE) (2018) β Noise-robust loss interpolating between MAE and cross-entropy via a tunable parameter q. π Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels β Zhang, Z. & Sabuncu, M.R. π» AlanChou/Truncated-Loss
Complement Objective Training (COT) (2019) β Augments cross-entropy with a complement objective that neutralizes non-target class probabilities. π Complement Objective Training β Chen, H.-Y. et al. π» henry8527/COT
Class-Balanced Loss (2019) β Re-weights loss by the effective number of samples per class for long-tailed distributions. π Class-Balanced Loss Based on Effective Number of Samples β Cui, Y. et al. π» vandit15/Class-balanced-loss-pytorch
Symmetric Cross-Entropy (SCE) (2019) β Combines standard CE with reverse CE for robustness to label noise. π Symmetric Cross Entropy for Robust Learning with Noisy Labels β Wang, Y. et al.
Bi-Tempered Logistic Loss (2019) β Two temperature parameters bound the loss (handling mislabeled data) and produce heavy-tailed softmax (handling outliers). π Robust Bi-Tempered Logistic Loss Based on Bregman Divergences β Amid, E. et al. π» google/bi-tempered-loss
Taylor Cross-Entropy Loss (2020) β Taylor series expansion of CE creating a noise-robust loss. π Can Cross Entropy Loss Be Robust to Label Noise? β Feng, L. et al.
Asymmetric Loss (ASL) (2021) β Different focusing levels for positive and negative samples in multi-label classification. π Asymmetric Loss For Multi-Label Classification β Ben-Baruch, E. et al. π» Alibaba-MIIL/ASL
Poly Loss (2022) β Views loss functions as polynomial expansions and adjusts leading coefficients; generalizes CE and focal loss. π PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions β Leng, Z. et al. π» abhuse/polyloss-pytorch
Mean Absolute Error (MAE) / L1 Loss (~1757) β Penalizes absolute differences; robust to outliers but non-smooth gradient at zero.
π Attributed to Boscovich, R.J. (1757)
π» torch.nn.L1Loss
Mean Squared Error (MSE) / L2 Loss (~1805) β Penalizes squared differences; sensitive to outliers. The method of least squares.
π Legendre, A.-M. (1805); Gauss, C.F. (1809)
π» torch.nn.MSELoss
Huber Loss (1964) β MSE for small errors, MAE for large errors. Robust to outliers with smooth gradients near zero.
π Robust Estimation of a Location Parameter β Huber, P.J.
π» torch.nn.HuberLoss
Tukey's Biweight Loss (1974) β Redescending M-estimator that completely rejects gross outliers beyond a threshold. π The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data β Beaton, A.E. & Tukey, J.W.
Quantile Loss / Pinball Loss (1978) β Asymmetrically penalizes over/under-predictions for quantile regression and uncertainty estimation. π Regression Quantiles β Koenker, R. & Bassett, G.
Smooth L1 Loss (2015) β L2 for small errors, L1 for large errors (Huber with Ξ΄=1); standard for bounding box regression.
π Fast R-CNN β Girshick, R.
π» torch.nn.SmoothL1Loss
Wing Loss (2018) β Amplifies small-to-medium range errors for facial landmark localization. π Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks β Feng, Z.-H. et al.
Balanced L1 Loss (2019) β Rebalances inlier vs. outlier loss contributions in object detection regression. π Libra R-CNN: Towards Balanced Learning for Object Detection β Pang, J. et al. π» OceanPang/Libra_R-CNN
Adaptive Wing Loss (2019) β Adapts curvature based on ground truth heatmap values for face alignment. π Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression β Wang, X. et al. π» protossw512/AdaptiveWingLoss
Log-Cosh Loss (2022) β Approximates Huber loss using log(cosh(x)); twice differentiable everywhere. π Statistical Properties of the Log-Cosh Loss Function Used in Machine Learning β Chen, K. et al.
Sensitivity-Specificity Loss (2015) β Weighted combination of sensitivity and specificity for extreme class imbalance in lesion segmentation. π Deep Convolutional Encoder Networks for Multiple Sclerosis Lesion Segmentation β Brosch et al.
Dice Loss (2016) β Directly optimizes the Dice coefficient (F1 score); robust to class imbalance. π V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation β Milletari, F. et al. π» JunMa11/SegLossOdyssey
Generalized Dice Loss (GDL) (2017) β Per-class volume weighting for multi-class segmentation with highly imbalanced labels. π Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations β Sudre, C.H. et al.
Tversky Loss (2017) β Tunable Ξ±/Ξ² parameters controlling the FP/FN trade-off; useful for small lesion segmentation. π Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks β Salehi, S.S.M. et al.
LovΓ‘sz-Softmax Loss (2018) β Tractable convex surrogate for directly optimizing the Jaccard index (IoU). π The LovΓ‘sz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure β Berman, M. et al. π» bermanmaxim/LovaszSoftmax
Exponential Logarithmic Loss (2018) β Combines exponentially weighted focal-style Dice and CE for very small structures. π 3D Segmentation with Exponential Logarithmic Loss for Highly Unbalanced Object Sizes β Wong et al.
Asymmetric Similarity Loss (2018) β Asymmetric FΞ²-score-based similarity to balance precision and recall. π Asymmetric Loss Functions and Deep Densely Connected Networks for Highly Imbalanced Medical Image Segmentation β Hashemi et al.
Focal Tversky Loss (2019) β Focal-style exponent on Tversky loss to focus on hard, misclassified regions. π A Novel Focal Tversky Loss Function with Improved Attention U-Net for Lesion Segmentation β Abraham, N. & Khan, N.M.
Boundary Loss (2019) β Distance metric on contour space rather than region overlap; effective for highly unbalanced tasks. π Boundary Loss for Highly Unbalanced Segmentation β Kervadec, H. et al. π» LIVIAETS/boundary-loss
Hausdorff Distance Loss (2019) β Directly optimizes the Hausdorff distance between predicted and ground-truth boundaries. π Reducing the Hausdorff Distance in Medical Image Segmentation with Convolutional Neural Networks β Karimi, D. & Salcudean, S.E.
Combo Loss (2019) β Weighted combination of modified CE and Dice loss for input and output class imbalance. π Combo Loss: Handling Input and Output Imbalance in Multi-Organ Segmentation β Taghanaki et al.
Region Mutual Information (RMI) Loss (2019) β Maximizes mutual information between predicted and ground-truth label regions. π Region Mutual Information Loss for Semantic Segmentation β Zhao et al. π» ZJULearning/RMI
Topological Loss (2019) β Uses persistent homology to enforce correct topological structure in segmentation. π Topology-Preserving Deep Image Segmentation β Hu et al. π» HuXiaoling/TopoLoss
Log-Cosh Dice Loss (2020) β Log-cosh smoothing on Dice loss for smoother gradients and stable training. π A Survey of Loss Functions for Semantic Segmentation β Jadon, S.
clDice (2021) β Topology-preserving loss for tubular structures; computes Dice on skeletonized centerlines. π clDice β A Novel Topology-Preserving Loss Function for Tubular Structure Segmentation β Shit et al. π» jocpae/clDice
Unified Focal Loss (2022) β Hierarchical framework generalizing Dice-based and CE-based losses with focal modulation. π Unified Focal Loss: Generalising Dice and Cross Entropy-Based Losses to Handle Class Imbalanced Medical Image Segmentation β Yeung et al. π» mlyg/unified-focal-loss
Smooth L1 Loss (2015) β Piecewise L2/L1 loss; standard for bounding box regression. π Fast R-CNN β Girshick, R.
IoU Loss (2016) β Directly regresses Intersection-over-Union between predicted and ground-truth boxes. π UnitBox: An Advanced Object Detection Network β Yu et al.
Focal Loss (2017) β Modulating factor (1βpβ)^Ξ³ down-weights easy negatives in dense detection. π Focal Loss for Dense Object Detection β Lin, T.-Y. et al. π» facebookresearch/detectron2
Bounded IoU Loss (2018) β Upper-bounds IoU change per coordinate for stable high-IoU refinement. π Improving Object Localization with Fitness NMS and Bounded IoU Loss β Tychsen-Smith & Petersson
GIoU Loss (2019) β Extends IoU with a penalty based on the smallest enclosing box; enables gradient flow for non-overlapping boxes. π Generalized Intersection over Union β Rezatofighi et al.
DIoU Loss (2020) β Adds normalized center-point distance penalty to IoU for faster convergence. π Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression β Zheng et al. π» Zzh-tju/DIoU
CIoU Loss (2020) β Extends DIoU with aspect ratio consistency penalty for complete geometric alignment. π Distance-IoU Loss β Zheng et al.
Alpha-IoU Loss (2021) β Power parameter Ξ± amplifies loss and gradient for high-quality anchors. π Alpha-IoU: A Family of Power Intersection over Union Losses β He et al. π» Jacobi93/Alpha-IoU
EIoU Loss (2022) β Decomposes CIoU penalty into separate width/height terms. π Focal and Efficient IOU Loss for Accurate Bounding Box Regression β Zhang et al.
SIoU Loss (2022) β Angle-aware penalty considering vector direction between predicted and target boxes. π SIoU Loss: More Powerful Learning for Bounding Box Regression β Gevorgyan
WIoU Loss (2023) β Dynamic non-monotonic focusing mechanism based on outlier degree. π Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism β Tong et al. π» Instinct323/Wise-IoU
MPDIoU Loss (2023) β Bounding box similarity via minimum point distances between corners. π MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression β Ma & Xu
Inner-IoU Loss (2023) β IoU through auxiliary inner bounding boxes with a scaling factor. π Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box β Zhang et al.
Minimax / Original GAN Loss (2014) β Discriminator maximizes, generator minimizes binary cross-entropy in a two-player minimax game. π Generative Adversarial Nets β Goodfellow et al.
Non-Saturating GAN Loss (2014) β Generator maximizes log(D(G(z))) instead of minimizing log(1βD(G(z))), providing stronger early gradients. π Generative Adversarial Nets β Goodfellow et al.
Feature Matching Loss (2016) β Generator matches expected feature statistics at an intermediate discriminator layer. π Improved Techniques for Training GANs β Salimans et al.
Least Squares GAN Loss (LSGAN) (2017) β L2 objective minimizing Pearson ΟΒ² divergence for more stable training. π Least Squares Generative Adversarial Networks β Mao et al.
Wasserstein Loss (WGAN) (2017) β Earth Mover's distance providing meaningful gradients even for non-overlapping distributions. π Wasserstein GAN β Arjovsky, M. et al. π» martinarjovsky/WassersteinGAN
WGAN-GP (2017) β Gradient penalty replacing weight clipping for better Lipschitz constraint enforcement. π Improved Training of Wasserstein GANs β Gulrajani et al.
Hinge Loss GAN (2017) β Max-margin formulation with bounded gradients; used in BigGAN, SAGAN. π Geometric GAN β Lim & Ye π Spectral Normalization for GANs β Miyato et al.
Spectral Normalization (2018) β Constrains spectral norm of weight matrices to stabilize discriminator training. π Spectral Normalization for Generative Adversarial Networks β Miyato et al.
R1 Regularization (2018) β Zero-centered gradient penalty on real data for local convergence guarantees. π Which Training Methods for GANs do actually Converge? β Mescheder et al. π» NVlabs/stylegan2-ada-pytorch
Relativistic GAN Loss (RaGAN) (2018) β Discriminator estimates probability that real data is more realistic than fake. π The Relativistic Discriminator β Jolicoeur-Martineau, A.
Mode Seeking Loss (2019) β Maximizes image/latent distance ratio to encourage diverse mode exploration. π Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis β Mao et al.
Path Length Regularization (2020) β Consistent Jacobian norm across latent space for smooth interpolations. π Analyzing and Improving the Image Quality of StyleGAN β Karras et al. π» NVlabs/stylegan2-ada-pytorch
LeCam Regularization (2021) β LeCam divergence-based stabilization under limited data. π Regularizing Generative Adversarial Networks under Limited Data β Tseng et al. π» google/lecam-gan
Projected GAN Loss (2021) β Multi-scale discrimination in projected feature space from pretrained networks. π Projected GANs Converge Faster β Sauer et al.
ELBO / VAE Loss (2013) β Reconstruction loss + KL divergence regularizer pushing posterior toward prior. π Auto-Encoding Variational Bayes β Kingma, D.P. & Welling, M. π» AntixK/PyTorch-VAE
Ξ²-VAE Loss (2017) β Upweights KL divergence (Ξ² > 1) for more disentangled latent representations. π Ξ²-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework β Higgins et al.
VQ-VAE Loss (2017) β Reconstruction + vector quantization commitment loss + codebook loss for discrete latents. π Neural Discrete Representation Learning β van den Oord et al.
WAE Loss (2018) β Penalized Wasserstein distance using MMD or adversarial regularization on latent space. π Wasserstein Auto-Encoders β Tolstikhin et al.
Denoising Score Matching (2011) β Training a denoising autoencoder equals matching the score function of noise-perturbed data. π A Connection Between Score Matching and Denoising Autoencoders β Vincent, P.
Score Matching with Langevin Dynamics (NCSN) (2019) β Noise-conditional score network across multiple noise scales with annealed Langevin sampling. π Generative Modeling by Estimating Gradients of the Data Distribution β Song, Y. & Ermon, S.
DDPM Loss (2020) β Simplified variational bound: predict the noise added at each diffusion step via weighted MSE. π Denoising Diffusion Probabilistic Models β Ho, J. et al.
Variational Diffusion Loss (2021) β Continuous-time variational lower bound with learnable noise schedule. π Variational Diffusion Models β Kingma et al.
v-prediction Loss (2022) β Predicts velocity v = Ξ±Β·Ξ΅ β ΟΒ·x for improved numerical stability and progressive distillation. π Progressive Distillation for Fast Sampling of Diffusion Models β Salimans, T. & Ho, J.
Rectified Flow Loss (2022) β Learns straight-line ODE trajectories between noise and data distributions. π Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow β Liu et al.
Flow Matching Loss (2023) β Simulation-free training for continuous normalizing flows; regresses vector fields of conditional probability paths. π Flow Matching for Generative Modeling β Lipman et al. π» facebookresearch/flow_matching
Consistency Loss (2023) β Self-consistency along the probability flow ODE for high-quality one-step generation. π Consistency Models β Song et al. π» OpenAI/consistency_models
SSIM Loss (2004) β Structural similarity using luminance, contrast, and structure comparisons; used as 1βSSIM. π Image Quality Assessment: From Error Visibility to Structural Similarity β Wang et al. π» VainF/pytorch-msssim
Style Loss (Gram Matrix) (2015) β Matches Gram matrices of CNN feature maps for texture/style transfer. π A Neural Algorithm of Artistic Style β Gatys et al.
Perceptual Loss / VGG Loss (2016) β L2 distance between deep feature representations of generated and target images. π Perceptual Losses for Real-Time Style Transfer and Super-Resolution β Johnson et al.
LPIPS (2018) β Learned perceptual metric using calibrated deep features; correlates better with human perception than SSIM/PSNR. π The Unreasonable Effectiveness of Deep Features as a Perceptual Metric β Zhang et al. π» richzhang/PerceptualSimilarity
Charbonnier Loss (1994) β Differentiable approximation to L1 (β(xΒ²+Ρ²)); robust to outliers, smooth at zero. π Two Deterministic Half-Quadratic Regularization Algorithms for Computed Imaging β Charbonnier et al.
MS-SSIM Loss (2003) β Multi-scale SSIM evaluating structural similarity across multiple resolutions. π Multi-Scale Structural Similarity for Image Quality Assessment β Wang et al.
SRGAN Loss (2017) β Adversarial loss + VGG perceptual content loss for photo-realistic 4Γ super-resolution. π Photo-Realistic Single Image Super-Resolution Using a GAN β Ledig et al.
Contextual Loss (2018) β Feature-level context matching without spatial alignment; enables training with non-aligned data. π The Contextual Loss for Image Transformation with Non-Aligned Data β Mechrez et al.
ESRGAN Loss (2018) β Relativistic average discriminator + pre-activation VGG perceptual loss for enhanced texture recovery. π ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks β Wang et al. π» xinntao/ESRGAN
Focal Frequency Loss (2021) β Adaptively focuses on hard-to-synthesize frequencies in the Fourier domain. π Focal Frequency Loss for Image Reconstruction and Synthesis β Jiang et al. π» EndlessSora/focal-frequency-loss
Contrastive Loss (2005) β Pairwise loss pulling similar pairs together and pushing dissimilar pairs apart by a margin. π Learning a Similarity Metric Discriminatively, with Application to Face Verification β Chopra, Hadsell, LeCun
N-pair Loss (2016) β Generalizes triplet loss by simultaneously pushing away negatives from Nβ1 classes. π Improved Deep Metric Learning with Multi-class N-pair Loss Objective β Sohn, K.
InfoNCE / CPC Loss (2018) β Noise-contrastive estimation maximizing mutual information between latent representations. π Representation Learning with Contrastive Predictive Coding β van den Oord et al. π» RElbers/info-nce-pytorch
MoCo Loss (2020) β InfoNCE with momentum-updated encoder and dynamic dictionary queue. π Momentum Contrast for Unsupervised Visual Representation Learning β He et al. π» facebookresearch/moco
NT-Xent / SimCLR Loss (2020) β Normalized temperature-scaled cross-entropy over cosine similarities of augmented pairs. π A Simple Framework for Contrastive Learning of Visual Representations β Chen et al.
BYOL Loss (2020) β MSE between L2-normalized predictions and targets; learns without negative pairs via momentum teacher. π Bootstrap Your Own Latent β Grill et al.
SwAV Loss (2020) β Swapped prediction contrasting cluster assignments from different augmented views. π Unsupervised Learning of Visual Features by Contrasting Cluster Assignments β Caron et al. π» facebookresearch/swav
Supervised Contrastive Loss (SupCon) (2020) β Extends self-supervised contrastive loss with label information to pull same-class embeddings together. π Supervised Contrastive Learning β Khosla et al. π» HobbitLong/SupContrast
Barlow Twins Loss (2021) β Cross-correlation matrix close to identity; reduces redundancy between embedding dimensions. π Barlow Twins: Self-Supervised Learning via Redundancy Reduction β Zbontar et al. π» facebookresearch/barlowtwins
DINO Loss (2021) β Self-distillation via cross-entropy between sharpened softmax outputs of student and momentum-teacher. π Emerging Properties in Self-Supervised Vision Transformers β Caron et al. π» facebookresearch/dino
SimSiam Loss (2021) β Negative cosine similarity with stop-gradient; no negatives, momentum, or large batches needed. π Exploring Simple Siamese Representation Learning β Chen & He
CLIP Loss (2021) β Symmetric cross-entropy over image-text cosine similarities aligning visual and language representations. π Learning Transferable Visual Models From Natural Language Supervision β Radford et al. π» mlfoundations/open_clip
VICReg Loss (2022) β Variance + invariance + covariance regularization preventing collapse without negatives. π VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning β Bardes et al. π» facebookresearch/vicreg
Decoupled Contrastive Loss (2022) β Removes positive term from InfoNCE denominator, eliminating negative-positive coupling. π Decoupled Contrastive Learning β Yeh et al.
DINOv2 Loss (2023) β DINO self-distillation + iBOT masked image modeling + Sinkhorn centering at scale. π DINOv2: Learning Robust Visual Features without Supervision β Oquab et al. π» facebookresearch/dinov2
SigLIP Loss (2023) β Pairwise sigmoid loss replacing softmax for efficient batch-parallel language-image pre-training. π Sigmoid Loss for Language Image Pre-Training β Zhai et al.
Triplet Loss (2015) β Minimizes anchor-positive distance while maximizing anchor-negative distance by a margin. π FaceNet: A Unified Embedding for Face Recognition and Clustering β Schroff et al. π» KevinMusgrave/pytorch-metric-learning
Lifted Structured Loss (2016) β Mines all positive and negative pairs in a batch simultaneously. π Deep Metric Learning via Lifted Structured Feature Embedding β Oh Song et al.
SphereFace / A-Softmax (2017) β Multiplicative angular margin on a hypersphere for discriminative face features. π SphereFace: Deep Hypersphere Embedding for Face Recognition β Liu et al.
Proxy-NCA Loss (2017) β Data-to-proxy comparisons with one learnable proxy per class; dramatically faster convergence. π No Fuss Distance Metric Learning Using Proxies β Movshovitz-Attias et al.
CosFace / LMCL (2018) β Cosine margin penalty on target logit in normalized softmax. π CosFace: Large Margin Cosine Loss for Deep Face Recognition β Wang et al. π» deepinsight/insightface
ArcFace (2019) β Additive angular margin with clear geodesic distance interpretation. π ArcFace: Additive Angular Margin Loss for Deep Face Recognition β Deng et al. π» deepinsight/insightface
Multi-Similarity Loss (2019) β Mines and weights pairs using self-similarity, relative similarity, and negative similarity. π Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning β Wang et al.
SoftTriple Loss (2019) β Multiple centers per class bridging proxy-based and triplet-based losses. π SoftTriple Loss: Deep Metric Learning Without Triplet Sampling β Qian et al.
Circle Loss (2020) β Unified pair similarity optimization with self-paced weighting. π Circle Loss: A Unified Perspective of Pair Similarity Optimization β Sun et al.
Proxy Anchor Loss (2020) β Proxies as anchors associated with all batch data; fast convergence. π Proxy Anchor Loss for Deep Metric Learning β Kim et al. π» tjddus9597/Proxy-Anchor-CVPR2020
Sub-center ArcFace (2020) β Multiple sub-centers per class for noisy label handling. π Sub-center ArcFace: Boosting Face Recognition by Large-Scale Noisy Web Faces β Deng et al.
AdaFace (2022) β Adaptive margin emphasizing hard or easy samples based on image quality. π AdaFace: Quality Adaptive Margin for Face Recognition β Kim et al. π» mk-minchul/AdaFace
ElasticFace (2022) β Random margin values from a normal distribution each iteration for flexible separability. π ElasticFace: Elastic Margin Loss for Deep Face Recognition β Boutros et al. π» fdbtrs/ElasticFace
Cross-Entropy / Next Token Prediction β Standard autoregressive LM loss; foundation of GPT and all causal LMs. π Language Models are Unsupervised Multitask Learners β Radford et al. (GPT-2, 2019)
Masked Language Model (MLM) Loss (2019) β Masks 15% of tokens and predicts from bidirectional context. Introduced pre-train/fine-tune for NLU. π BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding β Devlin et al.
Replaced Token Detection (RTD) (2020) β Discriminator classifies every token as original or replaced; loss defined over all tokens for better sample efficiency. π ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators β Clark et al.
Sentence Order Prediction (SOP) (2020) β Predicts whether two consecutive segments are in correct or swapped order. π ALBERT: A Lite BERT for Self-supervised Learning β Lan et al.
Span Corruption Loss (2020) β Masks contiguous spans; encoder-decoder reconstructs only missing spans. All NLP tasks as text-to-text. π Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer β Raffel et al.
Mixture of Denoisers (MoD) (2022) β Unifies causal LM, prefix LM, and span corruption into a single pre-training objective. π UL2: Unifying Language Learning Paradigms β Tay et al.
PPO Loss / RLHF (2017/2022) β Clipped surrogate objective for aligning LLMs with human preferences via a learned reward model. π Proximal Policy Optimization Algorithms β Schulman et al. π Training language models to follow instructions with human feedback β Ouyang et al. π» huggingface/trl
Reward Model Loss / Bradley-Terry (2022) β Cross-entropy on pairwise human preferences for training scalar reward models. π Training language models to follow instructions with human feedback β Ouyang et al.
SLiC-HF Loss (2023) β Contrastive ranking loss calibrating sequence likelihoods to human preferences. π SLiC-HF: Sequence Likelihood Calibration with Human Feedback β Zhao et al.
DPO Loss (2023) β Closed-form policy optimization directly from preference pairs; no separate reward model or RL loop. π Direct Preference Optimization: Your Language Model is Secretly a Reward Model β Rafailov et al. π» huggingface/trl β DPOTrainer
IPO Loss (2023) β Squared loss on preference margins avoiding overfitting to Bradley-Terry assumption. π A General Theoretical Paradigm to Understand Learning from Human Preferences β Azar et al.
CPO Loss (2024) β Contrastive preference loss without reference model for machine translation. π Contrastive Preference Optimization β Xu et al.
KTO Loss (2024) β Kahneman-Tversky prospect theory applied to alignment; works from binary (good/bad) feedback. π KTO: Model Alignment as Prospect Theoretic Optimization β Ethayarajh et al. π» huggingface/trl β KTOTrainer
GRPO Loss (2024) β Group Relative Policy Optimization; estimates advantages from sampled output groups, eliminating the critic model. π DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models β Shao et al.
ORPO Loss (2024) β Odds-ratio penalty added to SFT loss; combines instruction tuning and preference alignment in one stage. π ORPO: Monolithic Preference Optimization without Reference Model β Hong et al. π» huggingface/trl β ORPOTrainer
SimPO Loss (2024) β Reference-free preference optimization using length-normalized average log probability as implicit reward. π SimPO: Simple Preference Optimization with a Reference-Free Reward β Meng et al. π» princeton-nlp/SimPO
SPPO Loss (2024) β Self-play preference optimization framing alignment as a two-player constant-sum game. π Self-Play Preference Optimization for Language Model Alignment β Wu et al.
CTC Loss (2006) β Marginalizes over all valid alignments between input and output sequences; foundational for ASR.
π Connectionist Temporal Classification β Graves et al.
π» torch.nn.CTCLoss
RNN-T Loss (2012) β Extends CTC with a prediction network conditioning on previous outputs for streaming transduction.
π Sequence Transduction with Recurrent Neural Networks β Graves, A.
π» torchaudio.transforms.RNNTLoss
Scheduled Sampling Loss (2015) β Gradually replaces ground-truth tokens with model predictions during training to mitigate exposure bias. π Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks β Bengio et al.
Sequence-Level Training / MIXER (2016) β Directly optimizes BLEU/ROUGE using REINFORCE. π Sequence Level Training with Recurrent Neural Networks β Ranzato et al.
Minimum Risk Training (2016) β Minimizes expected task-level loss (e.g., 1βBLEU) via sampling. π Minimum Risk Training for Neural Machine Translation β Shen et al.
Mel-Spectrogram Reconstruction Loss (2017) β L1/L2 between predicted and target mel-spectrograms; primary TTS training objective. π Tacotron: Towards End-to-End Speech Synthesis β Wang et al.
Multi-Resolution STFT Loss (2020) β Spectral convergence + log-magnitude STFT at multiple FFT sizes for neural vocoder training. π Parallel WaveGAN β Yamamoto et al. π» csteinmetz1/auraloss
TD Loss / Temporal Difference (1988) β Bootstrapped value estimation updating predictions toward reward + discounted next-state value. π Learning to Predict by the Methods of Temporal Differences β Sutton, R.S.
Q-Learning Loss (1989) β Off-policy TD control bootstrapping with max Q-value over next actions. π Learning from Delayed Rewards β Watkins, C.J.C.H.
REINFORCE / Policy Gradient (1992) β Monte Carlo policy gradient weighted by returns. π Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning β Williams, R.J.
DQN Loss (2015) β Q-learning with deep networks, experience replay, and target networks. π Human-level Control through Deep Reinforcement Learning β Mnih et al. π» DLR-RM/stable-baselines3
Double DQN Loss (2015) β Decouples action selection from evaluation to reduce overestimation bias. π Deep Reinforcement Learning with Double Q-learning β van Hasselt et al.
DDPG Loss (2015) β Deterministic policy gradients for continuous control with experience replay. π Continuous Control with Deep Reinforcement Learning β Lillicrap et al.
GAE (2015) β Exponentially-weighted multi-step TD errors for tunable bias-variance tradeoff. π High-Dimensional Continuous Control Using Generalized Advantage Estimation β Schulman et al.
A3C / A2C Loss (2016) β Actor-critic with policy gradient + value function baseline + entropy bonus. π Asynchronous Methods for Deep Reinforcement Learning β Mnih et al.
Distributional RL / C51 Loss (2017) β Models full return distribution using categorical projection over fixed atoms. π A Distributional Perspective on Reinforcement Learning β Bellemare et al.
PPO Clipped Surrogate Loss (2017) β Clips probability ratio to prevent destructively large policy updates. π Proximal Policy Optimization Algorithms β Schulman et al. π» DLR-RM/stable-baselines3
HER Loss (2017) β Relabels failed trajectories with achieved goals for sample-efficient sparse-reward learning. π Hindsight Experience Replay β Andrychowicz et al.
QR-DQN Loss (2018) β Quantile regression approximating the return distribution with learnable quantile locations. π Distributional Reinforcement Learning with Quantile Regression β Dabney et al.
SAC Loss (2018) β Maximum entropy actor-critic balancing exploration and exploitation automatically. π Soft Actor-Critic β Haarnoja et al.
TD3 Loss (2018) β Clipped double-Q learning + delayed policy updates + target policy smoothing. π Addressing Function Approximation Error in Actor-Critic Methods β Fujimoto et al.
V-trace Loss (2018) β Importance-weighted off-policy correction for scalable distributed RL (IMPALA). π IMPALA: Scalable Distributed Deep-RL β Espeholt et al.
Decision Transformer Loss (2021) β RL as sequence modeling; autoregressive transformer conditioned on returns, trained with supervised loss. π Decision Transformer: Reinforcement Learning via Sequence Modeling β Chen et al. π» kzl/decision-transformer
Knowledge Distillation / KD Loss (2015) β Student matches softened output distribution of teacher via KL divergence at elevated temperature. π Distilling the Knowledge in a Neural Network β Hinton, Vinyals, Dean
FitNets / Hint Loss (2015) β Student mimics intermediate feature representations of teacher. π FitNets: Hints for Thin Deep Nets β Romero et al.
Attention Transfer Loss (2017) β Forces student to mimic spatial attention maps of teacher's intermediate layers. π Paying More Attention to Attention β Zagoruyko & Komodakis π» szagoruyko/attention-transfer
Born-Again Networks (2018) β Self-distillation where identical-architecture student outperforms teacher. π Born Again Neural Networks β Furlanello et al.
PKT / Probabilistic KD (2018) β Matches probability distributions in feature space rather than raw representations. π Learning Deep Representations with Probabilistic Knowledge Transfer β Passalis & Tefas
Relational KD / RKD (2019) β Transfers mutual relations (distances and angles) between examples. π Relational Knowledge Distillation β Park et al.
Self-Distillation Loss (2019) β Deeper layers supervise shallower classifiers within the same network. π Be Your Own Teacher β Zhang et al.
CRD / Contrastive Representation Distillation (2020) β Maximizes mutual information between teacher and student via contrastive objective. π Contrastive Representation Distillation β Tian et al. π» HobbitLong/RepDistiller
ReviewKD (2021) β Student's lower-level features guided by teacher's higher-level features through attention-based fusion. π Distilling Knowledge via Knowledge Review β Chen et al. π» dvlab-research/ReviewKD
DKD / Decoupled KD (2022) β Decouples KD into target-class and non-target-class components for independent weighting. π Decoupled Knowledge Distillation β Zhao et al. π» megvii-research/mdistiller
DIST Loss (2022) β Preserves inter-class relations and intra-class ranking rather than exact probability matching. π Knowledge Distillation from A Stronger Teacher β Huang et al. π» hunto/DIST_KD
KL Divergence (1951) β Measures information lost when approximating one distribution with another. π On Information and Sufficiency β Kullback & Leibler
L2 Regularization / Weight Decay (1970) β Penalizes sum of squared weights to prevent overfitting. π Ridge Regression β Hoerl & Kennard
L1 Regularization / Lasso (1996) β Penalizes sum of absolute weights, inducing sparsity. π Regression Shrinkage and Selection via the Lasso β Tibshirani, R.
Elastic Net (2005) β Combines L1 and L2 for sparsity + grouping of correlated features. π Regularization and Variable Selection via the Elastic Net β Zou & Hastie
Dropout (2014) β Randomly zeroes activations; implicit ensemble of exponentially many sub-networks. π Dropout: A Simple Way to Prevent Neural Networks from Overfitting β Srivastava et al.
Confidence Penalty (2017) β Penalizes low-entropy (overconfident) output distributions. π Regularizing Neural Networks by Penalizing Confident Output Distributions β Pereyra et al.
Mixup Loss (2018) β Trains on convex combinations of example pairs and their labels. π mixup: Beyond Empirical Risk Minimization β Zhang et al. π» facebookresearch/mixup-cifar10
Manifold Mixup (2019) β Extends Mixup to hidden representations at random intermediate layers. π Manifold Mixup: Better Representations by Interpolating Hidden States β Verma et al.
CutMix Loss (2019) β Cuts and pastes rectangular patches between images while mixing labels proportionally. π CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features β Yun et al.
Chamfer Distance (2017) β Average nearest-neighbor distance between two point sets; fast and widely used. π A Point Set Generation Network for 3D Object Reconstruction from a Single Image β Fan et al. π» facebookresearch/pytorch3d
Earth Mover's Distance (EMD) (2017) β Optimal transport distance with bijective matching; higher quality but more expensive than CD. π A Point Set Generation Network for 3D Object Reconstruction from a Single Image β Fan et al.
Normal Consistency Loss (2018) β Penalizes inconsistency of surface normals between adjacent mesh faces. π Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images β Wang et al.
Mesh Laplacian Smoothing Loss (2018) β Penalizes vertex deviation from neighbor centroid to prevent self-intersections. π Pixel2Mesh β Wang et al. π» facebookresearch/pytorch3d
SDF Loss (DeepSDF) (2019) β Regresses signed distance values; zero level-set defines the 3D surface. π DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation β Park et al. π» facebookresearch/DeepSDF
Occupancy Loss (2019) β Binary CE on predicted occupancy probabilities for 3D reconstruction. π Occupancy Networks: Learning 3D Reconstruction in Function Space β Mescheder et al.
NeRF Photometric Loss (2020) β MSE between rendered and observed pixel colors via differentiable volume rendering. π NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis β Mildenhall et al.
3D Gaussian Splatting Loss (2023) β L1 + D-SSIM for optimizing anisotropic 3D Gaussians for real-time radiance field rendering. π 3D Gaussian Splatting for Real-Time Radiance Field Rendering β Kerbl et al. π» graphdeco-inria/gaussian-splatting
Scale-Invariant Loss (2014) β Log-space depth error minus mean shift; invariant to global scale ambiguity. π Depth Map Prediction from a Single Image using a Multi-Scale Deep Network β Eigen et al.
Berhu Loss (Reverse Huber) (2016) β L1 for small residuals, L2 for large; robust depth regression. π Deeper Depth Prediction with Fully Convolutional Residual Networks β Laina et al.
Photometric Consistency Loss (2017) β Self-supervised SSIM + L1 with left-right disparity consistency for monocular depth. π Unsupervised Monocular Depth Estimation with Left-Right Consistency β Godard et al. π» nianticlabs/monodepth2
Edge-Aware Smoothness Loss (2017) β Locally smooth depth except at image edges, weighted by image gradients. π Unsupervised Monocular Depth Estimation with Left-Right Consistency β Godard et al.
Deep Supervision Loss (2015) β Auxiliary losses at intermediate layers providing direct gradient paths. π Deeply-Supervised Nets β Lee et al.
Dice Loss (2016) β Directly optimizes Dice coefficient for volumetric medical image segmentation. π V-Net β Milletari et al.
Generalized Dice Loss (2017) β Per-class volume weighting for highly imbalanced multi-class segmentation. π Generalised Dice Overlap as a Deep Learning Loss Function β Sudre et al.
Tversky Loss (2017) β Tunable FP/FN trade-off for small lesion segmentation. π Tversky Loss Function for Image Segmentation β Salehi et al.
Attention-Gated Loss (2018) β Learned attention gates suppress irrelevant regions in skip connections. π Attention U-Net: Learning Where to Look for the Pancreas β Oktay et al.
Boundary / Surface Loss (2019) β Distance metric on contour space for highly unbalanced medical segmentation. π Boundary Loss for Highly Unbalanced Segmentation β Kervadec et al. π» LIVIAETS/boundary-loss
Distance Map Penalized CE (2019) β Weights CE by distance transform maps to focus on boundary regions. π Distance Map Loss Penalty Term for Semantic Segmentation β Caliva et al.
Variational Graph Auto-Encoder (VGAE) Loss (2016) β Reconstruction BCE on adjacency matrix + KL divergence for unsupervised graph learning. π Variational Graph Auto-Encoders β Kipf & Welling
Node Classification Loss (2017) β Standard cross-entropy per-node in semi-supervised graph settings. π Semi-Supervised Classification with Graph Convolutional Networks β Kipf & Welling π» pyg-team/pytorch_geometric
Deep Graph Infomax (DGI) Loss (2019) β Maximizes mutual information between local node and global graph representations. π Deep Graph Infomax β VeliΔkoviΔ et al. π» PetarV-/DGI
Graph Matching Loss (2019) β Attention-based cross-graph matching with margin-based pairwise loss. π Graph Matching Networks for Learning the Similarity of Graph Structured Objects β Li et al.
InfoGraph Loss (2020) β Maximizes mutual information between graph-level and substructure-level representations. π InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning β Sun et al. π» sunfanyunn/InfoGraph
GraphCL Loss (2020) β NT-Xent contrastive loss on augmented graph views for self-supervised graph learning. π Graph Contrastive Learning with Augmentations β You et al. π» Shen-Lab/GraphCL
BGRL Loss (2022) β Negative-sample-free self-supervised loss bootstrapping graph representations (inspired by BYOL). π Large-Scale Representation Learning on Graphs via Bootstrapping β Thakoor et al. π» nerdslab/bgrl
ListNet Loss (2007) β Listwise learning-to-rank using top-one probability distributions. π Learning to Rank: From Pairwise Approach to Listwise Approach β Cao et al.
ListMLE Loss (2008) β Listwise loss based on likelihood of ground-truth permutation under Plackett-Luce model. π Listwise Approach to Learning to Rank: Theory and Algorithm β Xia et al.
BPR Loss (2009) β Pairwise loss maximizing posterior probability that user prefers observed over unobserved items. π BPR: Bayesian Personalized Ranking from Implicit Feedback β Rendle et al. π» guoyang9/BPR-pytorch
Sampled Softmax Loss (2015) β Approximates full softmax over large item vocabulary by sampling negatives. π On Using Very Large Target Vocabulary for Neural Machine Translation β Jean et al.
DirectAU Loss (2022) β Directly optimizes alignment and uniformity on the hypersphere for collaborative filtering. π Towards Representation Alignment and Uniformity in Collaborative Filtering β Wang et al. π» THUwangcy/DirectAU
Uncertainty Weighting / Homoscedastic Uncertainty (2018) β Learns task weights by modeling task-dependent uncertainty; noisy tasks auto-downweighted. π Multi-Task Learning Using Uncertainty to Weigh Losses β Kendall et al. π» median-research-group/LibMTL
GradNorm (2018) β Dynamically normalizes gradient magnitudes across tasks to balance training rates. π GradNorm: Gradient Normalization for Adaptive Loss Balancing β Chen et al.
MGDA (2018) β Multi-objective optimization finding Pareto-optimal descent direction via Frank-Wolfe on task gradients. π Multi-Task Learning as Multi-Objective Optimization β Sener & Koltun
PCGrad (2020) β Projects conflicting task gradients onto normal planes to reduce destructive interference. π Gradient Surgery for Multi-Task Learning β Yu et al.
CAGrad (2021) β Minimizes average loss while maximizing worst-case local improvement across tasks. π Conflict-Averse Gradient Descent for Multi-task Learning β Liu et al.
Nash-MTL (2022) β Nash bargaining game where tasks negotiate a joint update direction. π Multi-Task Learning as a Bargaining Game β Navon et al. π» AvivNavon/nash-mtl
NLL with Learned Variance (1994) β Network predicts mean and variance; NLL naturally trades off accuracy and calibration. π Estimating the Mean and Variance of the Target Probability Distribution β Nix & Weigend
MC Dropout (2016) β Dropout at test time as approximate Bayesian inference for uncertainty estimation. π Dropout as a Bayesian Approximation β Gal & Ghahramani
Deep Ensembles Loss (2017) β Ensemble of networks with proper scoring rules + adversarial training for diversity. π Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles β Lakshminarayanan et al.
Evidential Deep Learning Loss (2018) β Dirichlet prior over class probabilities; Bayes risk + KL divergence regularizer. π Evidential Deep Learning to Quantify Classification Uncertainty β Sensoy et al.
Maximum Mean Discrepancy (MMD) (2012) β Distribution distance in RKHS; aligns source and target features without adversarial training. π A Kernel Two-Sample Test β Gretton et al. π» ZongxianLee/MMD_Loss.Pytorch
Domain Adversarial Loss / DANN (2016) β Gradient reversal layer training domain classifier adversarially for domain-invariant features. π Domain-Adversarial Training of Neural Networks β Ganin et al. π» fungtion/DANN
Deep CORAL Loss (2016) β Aligns second-order statistics (covariance) of source and target deep features. π Deep CORAL: Correlation Alignment for Deep Domain Adaptation β Sun & Saenko
Wasserstein Distance for DA (2018) β Earth Mover's Distance as domain discrepancy measure with gradient penalty. π Wasserstein Distance Guided Representation Learning for Domain Adaptation β Shen et al.
Contrastive Domain Discrepancy (CDD) (2019) β Class-aware alignment maximizing inter-class and minimizing intra-class discrepancy across domains. π Contrastive Adaptation Network for Unsupervised Domain Adaptation β Kang et al.
- π A Comprehensive Survey of Loss Functions and Metrics in Deep Learning β Terven et al. (2025)
- π A Survey of Loss Functions for Semantic Segmentation β Jadon (2020)
- π Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook β Azad et al. (2023)
| Library | Focus | Link |
|---|---|---|
| PyTorch (built-in) | CE, BCE, MSE, Huber, CTC, KLDiv, etc. | pytorch.org |
| pytorch-metric-learning | Triplet, Contrastive, ArcFace, ProxyNCA, etc. | GitHub |
| SegLossOdyssey | Dice, Tversky, Boundary, Hausdorff, etc. | GitHub |
| Hugging Face TRL | DPO, PPO, KTO, ORPO, SimPO, etc. | GitHub |
| Stable-Baselines3 | DQN, PPO, SAC, TD3, A2C, etc. | GitHub |
| lightly | SimCLR, BYOL, MoCo, DINO, Barlow Twins, etc. | GitHub |
| insightface | ArcFace, CosFace, Sub-center ArcFace | GitHub |
| open_clip | CLIP, SigLIP contrastive losses | GitHub |
| PyTorch3D | Chamfer, mesh losses, point cloud losses | GitHub |
| PyTorch Geometric | GNN losses, link prediction, node classification | GitHub |
| LibMTL | Uncertainty weighting, GradNorm, PCGrad, Nash-MTL | GitHub |
| auraloss | Multi-Resolution STFT, mel losses | GitHub |
| BasicSR | Perceptual, SSIM, Charbonnier, GAN losses for SR | GitHub |
| kornia | Focal, Dice, SSIM, and more | GitHub |
| anomalib | Anomaly detection losses and methods | GitHub |
| Avalanche | Continual learning (EWC, SI, LwF, etc.) | GitHub |
| GluonTS | Time series forecasting losses | GitHub |
| audiocraft | Audio generation (EnCodec, MusicGen) | GitHub |
| AIF360 | Fairness and bias mitigation | GitHub |
If you find this useful, please star the repo β it helps others discover it.