Skip to content

yajur-khanna/micrograd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Micrograd — Clean Educational Reimplementation

A minimal, from-scratch automatic differentiation engine inspired by Andrej Karpathy’s micrograd, implemented with clarity and pedagogical intent.

This project demonstrates how reverse-mode autodiff (backpropagation) works under the hood by building a tiny scalar-based computation graph engine without relying on PyTorch, TensorFlow, or JAX.


Features

  • Scalar automatic differentiation

  • Dynamic computation graph

  • Reverse-mode backpropagation

  • Supported operations:

    • +, -, *, -
    • ReLU, Sigmoid
  • No external ML libraries


Project Goals

This project is not intended to be:

  • A performant deep learning framework

  • A drop-in replacement for PyTorch/JAX

  • Feature-complete beyond core autodiff mechanics

It is intended to:

  • Make backpropagation mechanically obvious

  • Help students understand how gradients flow

  • Serve as a foundation for extending toward neural networks

  • Act as a reference when learning larger frameworks


Core Abstraction

At the heart of the system is a Value object that stores:

  • data: the scalar value

  • grad: the accumulated gradient

  • _prev: parent nodes in the computation graph

  • _op: the operation that produced the node

  • _backward(): local gradient rule

Backpropagation is performed by:

  • Topologically sorting the computation graph

  • Traversing it in reverse

  • Applying the chain rule at each node


Example Usage

from micrograd import Value

x = Value(2.0)
y = Value(-3.0)
z = x * y + x**2

z.backward()

print(z.data)  # forward pass result
print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

This example builds a computation graph (given in the figure below) dynamically and computes gradients via reverse-mode autodiff. Computation Graph


Why Scalar-Based?

A reasonable critique is: Why not tensors?

The answer is clarity.

Scalar-based autodiff:

  • Makes the chain rule explicit

  • Avoids vectorized abstraction leakage

  • Forces understanding of computation graphs

  • Maps directly to how tensor frameworks generalize

  • Tensor support can be layered on later once the fundamentals are solid.


Inspiration & References

  • Andrej Karpathy — micrograd

  • Reverse-mode automatic differentiation

  • Computational graph theory

  • Backpropagation via chain rule


Possible Extensions

If you want to push this further:

  • Vector / tensor support

  • Neural network layers (Linear, MLP)

  • Optimizers (SGD, Adam)

  • Loss functions

  • GPU acceleration (purely educational)

About

Building a basic neural network from the ground up

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors