This repository contains the code accompanying the paper A Minimum Description Length Approach to Regularization in Neural Networks. It builds on the Minimum Description Length RNN repository and its PyTorch port MDLRNN-torch. We extend the original codebase to support additional regularization schemes, golden networks, gradient descent training of MDL-RNNs, and include various improvements and bug fixes.
- Install Python 3.9
- Install system dependencies
macOS:
brew install freetype mpichUbuntu:
apt-get install libsm6 libxext6 libxrender1 libffi-dev libopenmpi-dev libssl-dev libnss3-dev libncurses5-dev- Install Python packages:
pip install -r requirements.txt- Running a simulation:
python main.py --simulation <simulation_name> -n <number_of_islands>Simulations are defined in simulations.py. Intermediate and final networks are saved in the networks/ directory as
.pkl and .dot files.
All experiments from the paper are reproducible using this repository. Grammars are in corpora.py, golden networks in
manual_nets.py, and experiments in simulations.py. The experiments were run in a SLURM server using MPI for
parallelism. To run locally, set migration_channel="file".
The tasks are defined in simulations.py as:
an_bnan_bn_cndyck_1dyck_2arithmetictoy_english
To control the regularization method, you can override the configuration. By default, the simulation runs with the MDL
regularizer, defined by grammar_multiplier=1 and data_given_grammar_multiplier=1, while
regularization_method=None.
To use L2 regularization, for example, set grammar_multiplier=0 and regularization_method="l2".
The configuration can be overridden via the command line using --config, where you can also specify max_grammar_size
to limit
An example of running locally (without MPI) a Dyck-2 simulation with
python main.py -s dyck_2 -n 250 --config "{\"golden_networks\": [\"dyck_2\"], \"grammar_multiplier\": 0, \"regularization_method\": \"l2\", \"no_improvement_time\": 0, \"migration_channel\": \"file\"}"To recreate the entire experiment, make sure to run all four regularizers with all tasks using the appropriate config command-line options.
Again, the tasks are defined in simulations.py as:
an_bn_train_weights_from_goldenan_bn_cn_train_weights_from_goldendyck_1_train_weights_from_goldendyck_2_train_weights_from_goldenarithmetic_train_weights_from_goldentoy_english_train_weights_from_golden
Where the simulations defines in it that population is initialized only with golden networks and no architecture
mutations is allowed (via the configs num_golden_copies_in_initialization=500 and
allow_architecture_changing_mutations=False respectively).
To train our differentiable golden networks, you can use the train_mdlrnn_backprop.py script, which trains the entire
tasks found in the paper.
To analyze results from GA simulations, maintain a CSV file that logs each simulation along with its ID, status (e.g., Running, Finished, Time Limit), and objective.
CSV should look something like this:
task,server,current_job_id,state,objective,simulation_id
an_bn,ServerA,1,Finished,MDL,an_bn_prior_0.3_batch_500_<simulation_hash>
an_bn,ServerA,2,Finished,|D:G| + L1,an_bn_prior_0.3_batch_500_<simulation_hash>
an_bn,ServerA,3,Finished,|D:G| + L2,an_bn_prior_0.3_batch_500_<simulation_hash>
an_bn,ServerA,4,Finished,|D:G| + Limit |G|,an_bn_prior_0.3_batch_500_<simulation_hash>
...Then, run the following script on the server which holds the results:
python analysis/generate_simulation_results.pyThis will generate a results CSV that can be further analyzed locally without needing to access the server again, also locally using:
python analysis/analyze_simulation_results.pyFor analyzing experiment 3 results, after train_mdlrnn_backprop.py is ran, it generates a results CSV for analysis by:
python analysis/analyze_backprop_results.pyNote that the analysis scripts will both generate the target plot figure in the paper as well as the latex tables.
@misc{2025minimumdescriptionlengthapproach,
title={A Minimum Description Length Approach to Regularization in Neural Networks},
author={Matan Abudy and Orr Well and Emmanuel Chemla and Roni Katzir and Nur Lan},
year={2025},
eprint={2505.13398},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.13398},
}