OpenAI Whisper-compatible ASR web service built for NVIDIA RTX 5090 (sm_120 / Blackwell).
Uses faster-whisper with CTranslate2 on CUDA 12.8 — no PyTorch required.
Run from Docker Hub:
# docker-compose.yml
services:
whisper:
image: cipherdolls/whisper:latest
runtime: nvidia
ports:
- "9000:9000"
environment:
- ASR_MODEL=large-v3-turbo
- COMPUTE_TYPE=float16
volumes:
- whisper-models:/root/.cache/huggingface
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
whisper-models:docker compose upOr build from source:
docker compose up --buildSwagger UI: http://localhost:9000/docs
Transcribe or translate audio. Compatible with onerahmet/whisper-asr-webservice API.
Query params:
task—transcribe(default) ortranslatelanguage— language code (e.g.en,de,fr), auto-detected if omittedoutput—json(default),txt,vtt,srt,tsvword_timestamps—true/falseencode—true/falsevad_filter—true(default) /false
Body: audio_file (multipart form upload)
curl -X POST "http://localhost:9000/asr?task=transcribe&output=json&language=en&word_timestamps=true" \
-F "audio_file=@recording.wav"Detect the language of an audio file.
Returns model status, device, and compute type.
| Env Var | Default | Description |
|---|---|---|
ASR_MODEL |
large-v3-turbo |
Whisper model size |
ASR_MODEL_PATH |
/root/.cache/huggingface |
Model cache directory |
ASR_DEVICE |
cuda |
cuda or cpu |
COMPUTE_TYPE |
float16 |
float16 recommended (INT8 broken on sm_120) |
MODEL_IDLE_TIMEOUT |
0 |
Seconds before unloading idle model (0 = never) |
No pre-built Whisper Docker image supports RTX 5090 / Blackwell (sm_120). This image uses:
nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04for sm_120 supportfloat16compute type (INT8 tensor cores on Blackwell require padding fixes not yet in stable CTranslate2)- No PyTorch dependency — faster-whisper uses CTranslate2 directly