This Jupyter Notebook provides a step-by-step implementation of a simple neural network and an automatic differentiation engine (similar to Andrej Karpathy's micrograd) from scratch using Python. It's designed to be an educational tool to understand the fundamentals of how neural networks learn through backpropagation.
The notebook walks through:
- Defining basic mathematical functions and manually calculating their derivatives.
- Introducing the
Valueclass, a core component that encapsulates scalar values and enables automatic gradient computation. - Overloading arithmetic operators (
+,*,**) and implementing activation functions (tanh,exp) for theValueclass to build and track computation graphs. - Implementing the
backward()pass to automatically compute gradients for all parameters in a computation graph using the chain rule. - Visualizing these computation graphs using
graphvizto better understand the forward and backward passes. - Building
Neuron,Layer, andMLP(Multi-Layer Perceptron) classes using theValueobject. - Demonstrating a simple training loop where the MLP learns to fit a small dataset.
- Scalar
ValueObject: Tracks operations and gradients. - Automatic Differentiation: Implements backpropagation for gradient calculation.
- Supported Operations:
- Addition (
+) - Multiplication (
*) - Power (
**) - Tanh activation
- Exponentiation (
exp) - Subtraction (
-), Division (/), Negation (-) (derived from the above)
- Addition (
- Neural Network Components:
NeuronclassLayerclass (a collection of neurons)MLPclass (a multi-layer perceptron)
- Computation Graph Visualization: Uses
graphvizto draw the network's structure and the flow of data and gradients. - Training Loop: A basic example of training the MLP using gradient descent.
The core of the automatic differentiation engine is the Value class:
- Each
Valueobject stores a scalardatavalue and its associatedgrad(gradient), initialized to 0. - When arithmetic operations or supported functions (like
tanh) are performed onValueobjects, a newValueobject is created. This new object remembers its "children" (the operands or the inputValue) and the operation (_op) that produced it. - Crucially, each operation also defines a local
_backwardfunction. This function knows how to propagate the gradient from the outputValueback to its input "children"Values based on the chain rule of calculus. For example, forc = a + b,a.gradwill be incremented by1.0 * c.gradandb.gradwill also be incremented by1.0 * c.grad. - The
backward()method, when called on aValue(typically the final loss value), performs a topological sort of the entire computation graph leading to thatValue. - It then iterates through the sorted nodes in reverse order, calling the
_backwardfunction for each node. This process accumulates the gradients for allValueobjects that participated in the computation, effectively performing backpropagation.
To run this notebook, you'll need Python 3 and the following libraries:
math(standard library)numpymatplotlib(for plotting)graphviz(for visualizing computation graphs)- You might need to install the Graphviz software separately from the Python library. See Graphviz download page.
torch(PyTorch is used in one cell for comparison/verification of gradients but is not a core dependency for the custom implementation itself)random(standard library, used for weight initialization)
You can typically install the Python libraries using pip:
pip install numpy matplotlib graphviz torch