EngVision Whisper Service

Overview

EngVision Whisper is a speech-to-text service leveraging the Whisper model from OpenAI. It's designed to transcribe audio files efficiently and accurately. The service is built using Flask and can be easily deployed in a containerized environment.

Requirements

Docker
Python 3.7 or later

Installation and Setup

Building the Docker Image

Clone the Repository: Clone the repository to your local machine.

git clone https://github.com/EngVision/Whisper-Service
cd Whisper-Service

Build Docker Image:
```
docker build -t engvision-whisper .
```

Running the Container

Run the Docker container:

docker run -p 8000:8000 engvision-whisper

Usage

Starting the Service

Once the container is running, the service will be available at http://localhost:8000/.

API Endpoints

Root Endpoint (GET /):
- Description: A simple endpoint to confirm that the service is running.
- Response: "Hello from EngVision Whisper!"

Speech to Text Endpoint (POST /stt/<file_id>):

Description: Upload an audio file for transcription.
Response:

{
  "_id": "65dac9d6b407958caa1af7f7",
  "file_id": "65dac9d6b407958caa1af7f7",
  "status": "processing",
  "text": null
}

Check Status Endpoint (GET /stt/<file_id>):
- Description: Check the status of a transcription task.
- Response: A JSON with the status of the task and the result if completed.
```
{
  "_id": "65dac9d6b407958caa1af7f7",
  "file_id": "65dac9d6b407958caa1af7f7",
  "status": "completed",
  "text": " This is my favorite food."
}
```

Speech Evaluation Endpoint (POST /speech-evaluation):

Description: Get IPA and evaluation.
Body:

{
  "fileId": "65dac87bd469ca2adaed6f98",
  "original": "This is my favourite food"
}

Response:

{
  "_id": "65dac87bd469ca2adaed6f98",
  "correct_letters": "111 11 111 111111111 111 ",
  "original_ipa_transcript": "ðɪs ɪz maɪ ˈfeɪvərɪt fud",
  "original_transcript": "This is my favourite food",
  "pronunciation_accuracy": "95",
  "submission_id": "65dac87bd469ca2adaed6f98",
  "voice_ipa_transcript": "ðɪs ɪz maɪ ˈfeɪvərɪt fud",
  "voice_transcript": "this is my favorite food"
}

Example Usage

Uploading an Audio File for Transcription:

curl -F "file=@path_to_audio_file" http://localhost:8000/stt

Checking Transcription Status:

curl http://localhost:8000/stt/<file_id>

Contributing

Contributions are welcome. Please fork the repository and submit a pull request with your changes.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
.Dockerignore		.Dockerignore
.env		.env
.env.local		.env.local
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
enums.py		enums.py
interfaces.py		interfaces.py
models.py		models.py
requirements.txt		requirements.txt
speech_evaluation.py		speech_evaluation.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EngVision Whisper Service

Overview

Requirements

Installation and Setup

Building the Docker Image

Running the Container

Usage

Starting the Service

API Endpoints

Example Usage

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

EngVision/Whisper-Service

Folders and files

Latest commit

History

Repository files navigation

EngVision Whisper Service

Overview

Requirements

Installation and Setup

Building the Docker Image

Running the Container

Usage

Starting the Service

API Endpoints

Example Usage

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages