Shared GPU worker runtime crates for Wavey services.
This workspace is the extraction point for runtime concerns that should not live inside model-specific applications such as ASR or EnCodec.
gpu-worker-core- generic job metadata and executor traits
- no ONNX, Torch, or upload-response assumptions
gpu-worker- feature-gated facade crate for app services
- re-exports the shared core, runtime, and upload-response adapters behind modules
gpu-worker-ort- shared ONNX Runtime bootstrap
- provider policy for CPU, CUDA, TensorRT, and CoreML
- session construction and runtime discovery helpers
gpu-worker-torch- shared libtorch/tch helpers
- CUDA device, module loading, tensor construction, synchronization
gpu-worker-upload-response- shared adapter over
upload-response - local job abstraction for
request -> stageandstage -> response - reusable local worker loop with claim/inflight/heartbeat handling
- keeps transport concerns out of model workers
- shared adapter over
The intended dependency direction is:
- transport/queue adapter
- backend runtime
- model-specific execution
Concretely:
upload-responseowns the generic stream/ring transportgpu-worker::upload_responseadapts that transport into worker jobsgpu-worker-ortandgpu-worker-torchown backend runtime policy- app crates such as
asr-onnx,asr-torch, andencodec-rsshould only own model semantics, preprocessing, and postprocessing
Current first-phase extraction:
encodec-rsusesgpu-worker-ortfor ONNX session constructiongpu-worker::upload_responseprovides the first reusable local worker job abstraction on top of named intermediate stagesasr-apiuses the facade crate for shared local and remote upload-response worker loops
Still to do:
- remote worker/job discovery in the upload-response adapter
- a generic worker loop/batching layer on top of
gpu-worker-core - thin app worker binaries that replace the remaining in-crate thread pools