Research Paper: https://arxiv.org/abs/1602.02068
The paper focuses on sparsemax, a new activation function which is similar to softmax, but outputs sparse probabilities.
It has properties similar to the softmax and it’s Jacobian can be efficiently computed, enabling its use in a neural network trained with backpropagation.
Then, a new smooth and convex loss function which is the analogue of the logistic loss is defined for sparsemax.
Promising empirical results are obtained in multi-label classification problems and in attention-based neural networks for natural language inference but with a selective, more compact, attention focus.
Implementation is available in "Code" Directory
It is available in "Presentation" Directory.