Could you add documentation on how to train the models? Specifically: - What format should the input files have? - What command is used to run training? I see training code in the repository, but there are no explanations on how it works.