Keep it Simple (KiS)

This repository contains the code for ACL2021 paper: Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text.

Installation for Training Procedure

The requirements.txt provides the list of pip packages required to use and train models. One must also install a spaCy model:

python -m spacy download en_core_web_sm

For training, one must manually install the apex library, used for mixed-precision training (see: https://github.com/nvidia/apex), as it is not avaiable on pip.

For training, two pre-trained models are needed, which we provide in the Keep it Simple Release:

coverage_roberta.bin: A model compatible with a roberta-base of the Roberta HuggingFace implementation, used for the salience scorer (coverage model).
gpt2_med_cp90.bin: A model compatible with a gpt2-medium of the GPT2 HuggingFace implementation, used as the initial model for the generator.

Once the packages are installed, and the models are downloaded, the training script can be run:

python train_keep_it_simple.py --experiment initial_run --model_start_file /path/to/gpt2_med_cp90.bin

See the script for additional hyper-parameters. With the default hyperparameters provided, the script should converge within 16-24 hours to a model achieving a strong (yet not optimal) score, when trained using a single V-100 or equivalent.

The provided training script uses CCNews as a rudimentary demonstration dataset, and was not the one used to obtain results in our experiments (we use a larger news corpus that we cannot release due to copyright). We recommend replacing CCNews with in-domain data for better results.

Example Training Run

To ease with debugging and reproducibilty, we release the log of an example training run of Keep it Simple. It can be accessed as a view-only Wandb report.

Running a Trained Model

To simplify text with a trained model, an example script is provided:

python run_keep_it_simple.py --model_card gpt2-medium --model_file /home/phillab/models/ACL2021/gpt2_med_keep_it_simple.bin

The script outputs several candidate simplifications for a given input paragraph, emphasizing the insertions and deletions made by the model using color (green, red).

In the Keep it Simple Release, we provide a model checkpoint we trained using the Keep it Simple procedure that achieves a high-average reward on news paragraphs: gpt2_med_keep_it_simple.bin. This is intended to facilitate usage of the model, and comparison with upcoming Text Simplification models.

Cite the work

If you make use of the code, models, or algorithm, please cite our paper:

@inproceedings{laban2021keep_it_simple,
  title={Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text},
  author={Philippe Laban and Tobias Schnabel and Paul N. Bennett and Marti A. Hearst},
  booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
  volume={1},
  year={2021}
}

Contributing

If you'd like to contribute, or have questions or suggestions, you can contact us at phillab@berkeley.edu. All contributions welcome! For example, if you have a type of text data on which you want to apply Keep it Simple.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
Newsela_Paragraph_Alignment_Dataset.ipynb		Newsela_Paragraph_Alignment_Dataset.ipynb
README.md		README.md
model_fluency.py		model_fluency.py
model_generator.py		model_generator.py
model_guardrails.py		model_guardrails.py
model_salience.py		model_salience.py
model_simplicity.py		model_simplicity.py
requirements.txt		requirements.txt
run_keep_it_simple.py		run_keep_it_simple.py
train_keep_it_simple.py		train_keep_it_simple.py
utils_edits.py		utils_edits.py
utils_masking.py		utils_masking.py
utils_misc.py		utils_misc.py
utils_optim.py		utils_optim.py
utils_rl.py		utils_rl.py
utils_sampling.py		utils_sampling.py
utils_scoring.py		utils_scoring.py
utils_timing.py		utils_timing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keep it Simple (KiS)

Installation for Training Procedure

Example Training Run

Running a Trained Model

Cite the work

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Keep it Simple (KiS)

Installation for Training Procedure

Example Training Run

Running a Trained Model

Cite the work

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages