A Minimum Description Length Approach to Regularization in Neural Networks

Matan Abudy, Orr Well, Emmanuel Chemla, Roni Katzir, Nur Lan

This repository contains the code accompanying the paper A Minimum Description Length Approach to Regularization in Neural Networks. It builds on the Minimum Description Length RNN repository and its PyTorch port MDLRNN-torch. We extend the original codebase to support additional regularization schemes, golden networks, gradient descent training of MDL-RNNs, and include various improvements and bug fixes.

Getting started

Install Python 3.9
Install system dependencies

macOS:

brew install freetype mpich

Ubuntu:

apt-get install libsm6 libxext6 libxrender1 libffi-dev libopenmpi-dev libssl-dev libnss3-dev libncurses5-dev

Install Python packages:

pip install -r requirements.txt

Running a simulation:

python main.py --simulation <simulation_name> -n <number_of_islands>

Simulations are defined in simulations.py. Intermediate and final networks are saved in the networks/ directory as .pkl and .dot files.

Reproducing Results

All experiments from the paper are reproducible using this repository. Grammars are in corpora.py, golden networks in manual_nets.py, and experiments in simulations.py. The experiments were run in a SLURM server using MPI for parallelism. To run locally, set migration_channel="file".

Experiment 1: Genetic Architecture Search

The tasks are defined in simulations.py as:

an_bn
an_bn_cn
dyck_1
dyck_2
arithmetic
toy_english

To control the regularization method, you can override the configuration. By default, the simulation runs with the MDL regularizer, defined by grammar_multiplier=1 and data_given_grammar_multiplier=1, while regularization_method=None.

To use L2 regularization, for example, set grammar_multiplier=0 and regularization_method="l2".

The configuration can be overridden via the command line using --config, where you can also specify max_grammar_size to limit $|H|$, and which golden network to include.

An example of running locally (without MPI) a Dyck-2 simulation with $L_2$ is as follows:

python main.py -s dyck_2 -n 250 --config "{\"golden_networks\": [\"dyck_2\"], \"grammar_multiplier\": 0, \"regularization_method\": \"l2\", \"no_improvement_time\": 0, \"migration_channel\": \"file\"}"

To recreate the entire experiment, make sure to run all four regularizers with all tasks using the appropriate config command-line options.

Experiment 2: GA in weight-training settings

Again, the tasks are defined in simulations.py as:

an_bn_train_weights_from_golden
an_bn_cn_train_weights_from_golden
dyck_1_train_weights_from_golden
dyck_2_train_weights_from_golden
arithmetic_train_weights_from_golden
toy_english_train_weights_from_golden

Where the simulations defines in it that population is initialized only with golden networks and no architecture mutations is allowed (via the configs num_golden_copies_in_initialization=500 and allow_architecture_changing_mutations=False respectively).

Experiment 3: Gradient Descent

To train our differentiable golden networks, you can use the train_mdlrnn_backprop.py script, which trains the entire tasks found in the paper.

Analyzing Results

To analyze results from GA simulations, maintain a CSV file that logs each simulation along with its ID, status (e.g., Running, Finished, Time Limit), and objective.

CSV should look something like this:

task,server,current_job_id,state,objective,simulation_id
an_bn,ServerA,1,Finished,MDL,an_bn_prior_0.3_batch_500_<simulation_hash>
an_bn,ServerA,2,Finished,|D:G| + L1,an_bn_prior_0.3_batch_500_<simulation_hash>
an_bn,ServerA,3,Finished,|D:G| + L2,an_bn_prior_0.3_batch_500_<simulation_hash>
an_bn,ServerA,4,Finished,|D:G| + Limit |G|,an_bn_prior_0.3_batch_500_<simulation_hash>
...

Then, run the following script on the server which holds the results:

python analysis/generate_simulation_results.py

This will generate a results CSV that can be further analyzed locally without needing to access the server again, also locally using:

python analysis/analyze_simulation_results.py

For analyzing experiment 3 results, after train_mdlrnn_backprop.py is ran, it generates a results CSV for analysis by:

python analysis/analyze_backprop_results.py

Note that the analysis scripts will both generate the target plot figure in the paper as well as the latex tables.

Citation

@misc{2025minimumdescriptionlengthapproach,
      title={A Minimum Description Length Approach to Regularization in Neural Networks}, 
      author={Matan Abudy and Orr Well and Emmanuel Chemla and Roni Katzir and Nur Lan},
      year={2025},
      eprint={2505.13398},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.13398}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
assets		assets
LICENSE.md		LICENSE.md
README.md		README.md
configuration.py		configuration.py
corpora.py		corpora.py
dfa.py		dfa.py
genetic_algorithm.py		genetic_algorithm.py
island.py		island.py
main.py		main.py
manual_nets.py		manual_nets.py
mdlrnn.py		mdlrnn.py
network.py		network.py
requirements.txt		requirements.txt
simulations.py		simulations.py
spreadsheet_api.py		spreadsheet_api.py
test_corpus.py		test_corpus.py
test_genetic_algorithm.py		test_genetic_algorithm.py
test_network.py		test_network.py
test_torch_conversion.py		test_torch_conversion.py
torch_conversion.py		torch_conversion.py
train_mdlrnn_backprop.py		train_mdlrnn_backprop.py
utils.py		utils.py
vanilla_rnn.py		vanilla_rnn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Minimum Description Length Approach to Regularization in Neural Networks

Matan Abudy, Orr Well, Emmanuel Chemla, Roni Katzir, Nur Lan

Getting started

Reproducing Results

Experiment 1: Genetic Architecture Search

Experiment 2: GA in weight-training settings

Experiment 3: Gradient Descent

Analyzing Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Minimum Description Length Approach to Regularization in Neural Networks

Matan Abudy, Orr Well, Emmanuel Chemla, Roni Katzir, Nur Lan

Getting started

Reproducing Results

Experiment 1: Genetic Architecture Search

Experiment 2: GA in weight-training settings

Experiment 3: Gradient Descent

Analyzing Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages