Pruning

This project was inspired by an interest in exploring pruning and sparse tensors.

Notebooks

Pruning_Tutorial.ipynb: Based on the official PyTorch pruning tutorial: https://docs.pytorch.org/tutorials/intermediate/pruning_tutorial.html
Testing_Prunes.ipynb: An exploration that builds on the tutorial techniques to study how different pruning styles affect model accuracy.

What I found (so far)

I originally expected a more “smooth” trade-off: a model loses some weights, retrains in the pruned state, and then recovers to reasonably high accuracy. In the (limited) comparisons I ran here, the outcomes were much more distinct.

I compared two pruning approaches:

Per-layer pruning: unstructured, L1-magnitude pruning applied independently per layer (removing the least important fraction within each layer).
Global pruning: unstructured, L1-magnitude pruning applied across the whole model (removing the least important weights across all layers).

In general, the globally pruned model is far more reliable than the per-layer model. That makes intuitive sense: with global pruning, the most redundant weights can be removed from across the (intentionally over-parameterized) network. I chose an over-parameterized model on purpose to see whether a deeper model could survive more iterative pruning while retaining accuracy.

By contrast, the per-layer model collapses immediately in performance and then appears to go through jarring training cycles. It never reaches accuracy higher than the global model in the runs I performed. This also makes sense: per-layer pruning removes (for example) the “worst” 20% of weights in every layer, regardless of how important that layer is overall. A layer’s “worst” weights can still be more important than another layer’s “best” weights, so pruning uniformly per layer can be much more destructive.

One interesting detail: from the printed metrics during pruning/training, much of the global pruning seems to happen in the middle layers of the network—the most “inner” layers appear to be pruned the most. This suggests a follow-up exploration into model size, depth, and layer-wise redundancy.

Dataset note

This experiment uses a relatively simple scikit-learn dataset: https://archive.ics.uci.edu/dataset/31/covertype

Because the model is quite deep, it fits the dataset early. As a result, the best-performing models appear very early (obvious in the per-layer case, since it crashes after the first pruning step), and less obviously in the global case, which in my runs appears to maximize accuracy and minimize evaluation loss around epoch ~300.

Hardware note

All tests were run on an NVIDIA RTX 4000 (Blackwell architecture) with CUDA 13 and PyTorch 2.10.

Try your own settings

Feel free to adjust pruning hyperparameters, training settings, or even swap the dataset to conduct your own experiments.

Plots

These plots are generated by the comparison run stored under checkpoints/_comparisons/.

Corrective note (data issue): In the train vs validation loss plot above, the “training loss” and “validation loss” curves are not computed from the same model state at each checkpoint. In the training loop, train_loss is logged before pruning (after the model has trained for the epoch), while the checkpoints we later evaluate for val_loss are saved after pruning (the newly pruned model). Therefore, at prune epochs the reported train_loss corresponds to the pre-prune, recovered network, whereas the val_loss corresponds to the post-prune, freshly cut network. This systematic pre-prune vs post-prune mismatch makes the two lines appear to diverge sharply—especially under per-layer pruning—because they are sampling opposite sides of the pruning “shock,” not the same model snapshot.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
checkpoints		checkpoints
.gitignore		.gitignore
LICENSE		LICENSE
Pruning_Tutorial.ipynb		Pruning_Tutorial.ipynb
README.md		README.md
Testing_Prunes.ipynb		Testing_Prunes.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pruning

Notebooks

What I found (so far)

Dataset note

Hardware note

Try your own settings

Plots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pruning

Notebooks

What I found (so far)

Dataset note

Hardware note

Try your own settings

Plots

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages