A fully-featured multilayer perceptron (MLP) implemented entirely from scratch in C++ for supervised learning tasks. Supports both Mean Squared Error (MSE) and Cross-Entropy loss, online/offline training, and softmax outputs.
This project was developed as part of my Computer Engineering degree to learn and demonstrate the implementation of neural networks from the ground up.
Visualization of a simple 2‑2‑1 neural network architecture, showing input, hidden, and output layers.
This multilayer perceptron (MLP) implements the full forward and backward propagation pipeline from scratch. The steps are:
-
Forward Propagation
- Inputs are fed into the network.
- Each hidden and output neuron computes a weighted sum of its inputs plus a bias.
- Activation functions are applied:
- Sigmoid for hidden layers
- Sigmoid or Softmax for the output layer depending on the task
- The network produces an output vector representing either:
- Continuous values for regression tasks
- Class probabilities for classification tasks
-
Loss Calculation
- The network computes the error between predicted outputs and target values.
- Supported loss functions:
- Mean Squared Error (MSE)
- Cross-Entropy Loss
-
BackPropagation
- Gradients of the loss with respect to each weight are computed using the chain rule.
- Errors are propagated from output to input layers.
- Weights are updated according to:
- Learning rate (η)
- Momentum (μ) if enabled
- Supports online (per-sample) or offline (batch) weight updates
-
Training Loop
- Repeat forward → loss → backward for a number of iterations or until convergence.
- Save best-performing weights for evaluation or prediction.
| Flag | Argument | Required | Description |
|---|---|---|---|
-t |
<file> |
Yes | Path to the training dataset. |
-T |
<file> |
No | Path to the test dataset. If omitted, training set is used for testing. |
-l |
<int> |
No | Number of hidden layers. Default: 1. |
-h |
<int> |
No | Neurons per hidden layer. Default: 4. |
-i |
<int> |
No | Maximum number of training iterations. Default: 1000. |
-e |
<float> |
No | Learning rate (η). Default: 0.7. |
-m |
<float> |
No | Momentum coefficient (μ). Default: 1.0. |
-f |
<int> |
No | Loss function: 0 = MSE, 1 = Cross-Entropy. Default: 0. |
-n |
— | No | Enable input normalization (scale to [-1, 1]). |
-o |
— | No | Enable online training (weight updates per sample). |
-s |
— | No | Use softmax activation in the output layer. |
-p |
— | No | Prediction mode (Kaggle-style). Requires -w to load saved weights. |
-w |
<file> |
No | Save trained model weights to a file. |
The network expects the dataset in a simple text format (.dat). Each file should contain numeric values only.
- First line: Network structure
<number_of_inputs> <number_of_outputs> <number_of_hidden_neurons_per_layer>
- Following lines: Input and target values
- Each row contains input values followed by target values.
- Example:
1 -1 1 0
-1 -1 0 1
-1 1 1 0
1 1 0 1
- The first N numbers are inputs
- The last M numbers are targets
- No headers should be included; numeric values only.
- You can enable input normalization with the
-nflag to scale inputs to [-1, 1]. - Supports multi-output for regression or classification tasks.
The XOR problem is a classic example of a non-linearly separable function. A simple perceptron cannot solve it, but a small MLP with one hidden layer and two neurons can learn it due to non-linear activation functions.
| Input 1 | Input 2 | Target |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
The network predicts continuous outputs using a sigmoid activation on the output neuron.
Sigmoid predictions for XOR regression. The model successfully separates the non-linear classes.
Training convergence for regression: MSE decreases over iterations for both training and test datasets.
For classification, the output layer uses softmax activation to provide class probabilities.
Softmax probabilities for XOR classification. The model successfully separates the non-linear classes.
Training convergence for classification: Cross-Entropy improves and stabilizes as training progresses.
This demonstrates the network's ability to learn non-linear mappings, a key feature of multilayer perceptrons.
Follow these steps to compile and run the neural network:
The repository includes a Makefile. Simply run:
git clone https://github.com/javierespdev/neural-network-from-scratch.git
cd neural-network-from-scratch
makeRun the executable with the training dataset:
./bin/mlp -t data/train_xor.datBy default, the network will train with 1 hidden layer and 4 neurons per layer. You can adjust parameters like learning rate, number of iterations, or hidden layers using command-line flags.
./bin/mlp -t data/train_xor.dat -l 2 -h 3 -i 5000 -e 0.5 -f 1 -s -o-l 2→ 2 hidden layers-h 3→ 3 neurons per hidden layer-i 5000→ 5000 iterations-e 0.5→ learning rate 0.5-f 1→ Cross-Entropy loss-s→ Softmax output-o→ online training mode
Once trained, you can save weights and use them for prediction:
# Save trained weights
./bin/mlp -t data/train_xor.dat -w weights.bin
# Predict with saved model
./bin/mlp -p -w weights.bin -T data/test_xor.datThis project is licensed under the MIT License - see the LICENSE file for details.