Skip to content

Professor322/asr_hse

Repository files navigation

Automatic Speech Recognition (ASR)

AboutInstallationHow To UseCreditsLicense

About

This repository is an attempt to train an ASR model. Trained model is able to achieve 23% WER and 10% CER on test-clean dataset, leveraging beam search and language model guidance. Underlying model uses deepspeech2.

val_CER_(Argmax): 0.15835317756054224
val_WER_(Argmax): 0.44245870354749056
val_CER_(BeamSearchLM): 0.1086015791507941
val_WER_(BeamSearchLM): 0.2379445747889793

Follow the steps desribed in "How to use" section to run inference on the best model to reproduce stated results, or run training to create the model with the same performance.

Link to wandb artifcats

Full report can be found here

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda or venv (+pyenv).

    a. conda version:

    # create env
    conda create -n project_env python=PYTHON_VERSION
    
    # activate env
    conda activate project_env

    b. venv (+pyenv) version:

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install
  4. Also make sure that gzip utility is installed

How To Use

To train a model and to reproduce stated results, run the following command:

python3 train.py -cn=deepspeech_char_colab.yaml trainer.save_dir="saved"

Train for 23 epochs.

To download model that achieves stated result use this command

gdown https://drive.google.com/uc\?id\=1Ook3vMZV9c7D-TMen6GjIBKWzaAQnwHR

To run inference (evaluate the model or save predictions):

python3 python3 inference.py -cn=inference.yaml inferencer.from_pretrained=<path_to_downloaded_model>

Specified LM model will be downloaded automatically

Credits

This repository is based on a PyTorch Project Template.

License

License

About

Automatic Speech Recognition at HSE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages