Project repository of Knowledge Engineering. Project name: Automatic Speech Recognition (ASR) building, trainging and inferring on a Recurrent Neural Network for offline automatic subtitles generation.
Based on:
- Python 3.9
- Tensorflow 2.6.1
Example of usage:
- Clone the repository:
git clone https://github.com/uigiporc/icon-sr.git
- Download and extract the dataset from LibriSpeech
curl.exe -o dataset.zip https://www.openslr.org/resources/12/train-clean-360.tar.gz
tar -xzvf dataset.zip
- Install the dependencies in requirements.txt
pip3 install -r icon-sr/preprocessing/requirements.txt
The preproccesing is necessary only if you want to train from scratch the model. To do so, edit DATASET_PATH and PROCCESSED_PATH in preprocessing/preprocessing.py. Then run:
python3 icon-sr/preprocessing/preprocessing.py
To train the model, upload PROCESSED_PATH to a Google Cloud Bucket (for TPU hardware acceleration) or Google Drive (for GPU or no hardware acceleration). Then, set the paths in the notebook speech_to_text.ipynb, and run on Colab.
With Librispeech-train-clean-360 expect around 60 minutes per Epoch on TPU.
Install:
pip3 install pipwin
pipwin install pyaudio
Enable and set Stereo Mix as your default input device. For instructions, see this
Then run:
python3 inference.py
Untested. Should work without too many changes.
