Skip to content

Knowledge Engineering project. Transparent Offline Speech Recognition based on deep learning model. Authors: Porcelli Luigi, Nicolò Cucinotta.

Notifications You must be signed in to change notification settings

ncucinotta/icon-sr

 
 

Repository files navigation

icon-sr

Project repository of Knowledge Engineering. Project name: Automatic Speech Recognition (ASR) building, trainging and inferring on a Recurrent Neural Network for offline automatic subtitles generation.

Based on:

  • Python 3.9
  • Tensorflow 2.6.1

Example of usage:

IMAGE ALT TEXT

Prerequisites

  • Clone the repository:
git clone https://github.com/uigiporc/icon-sr.git
curl.exe -o dataset.zip https://www.openslr.org/resources/12/train-clean-360.tar.gz
tar -xzvf dataset.zip
  • Install the dependencies in requirements.txt
pip3 install -r icon-sr/preprocessing/requirements.txt

Preprocessing

The preproccesing is necessary only if you want to train from scratch the model. To do so, edit DATASET_PATH and PROCCESSED_PATH in preprocessing/preprocessing.py. Then run:

python3 icon-sr/preprocessing/preprocessing.py

Training

To train the model, upload PROCESSED_PATH to a Google Cloud Bucket (for TPU hardware acceleration) or Google Drive (for GPU or no hardware acceleration). Then, set the paths in the notebook speech_to_text.ipynb, and run on Colab.

With Librispeech-train-clean-360 expect around 60 minutes per Epoch on TPU.

Inference

Windows

Install:

pip3 install pipwin
pipwin install pyaudio

Enable and set Stereo Mix as your default input device. For instructions, see this

Then run:

python3 inference.py

Linux

Untested. Should work without too many changes.

About

Knowledge Engineering project. Transparent Offline Speech Recognition based on deep learning model. Authors: Porcelli Luigi, Nicolò Cucinotta.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PureBasic 60.1%
  • Jupyter Notebook 31.6%
  • Python 8.3%