Cuda 13 Support for CrispASR

First of all @CrispStrobe awesome work. I was tired of using pip install blah blah for using speech to text and whisper.cpp provided limited application. Keep the good work. 


I downloaded the built binaries for crispasr and tried to run 

```
! ./new/crispasr-linux-x86_64-cuda/crispasr \
  --backend parakeet \
  -m parakeet-tdt-0.6b-v3-q4_k.gguf \
  -f TranscriptionOffline/noisy_audio/audio_1.wav
```

But i got 
```
./new/crispasr-linux-x86_64-cuda/crispasr: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
```

So now i decided to build it myself


```
cd CrispASR/ && cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON

cd CrispASR/ && cmake --build build -j$(nproc) --target crispasr-lib  # for only .so files
cd CrispASR/ && cmake --build build -j$(nproc)
```
It succeeded so i decided to test it out

It was failing when i executed on gpu  so I dig  a little deeper 

```
gdb --batch \
  -ex "set pagination off" \
  -ex "run" \
  -ex "thread apply all bt full" \
  --args \
  ./CrispASR/build/bin/crispasr \
  --backend parakeet \
  --gpu-backend cuda \
  -m parakeet-tdt-0.6b-v3-q4_k.gguf \
  -f TranscriptionOffline/noisy_audio/audio_1.wav \
  -v
```

It showed the following output

```
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
crispasr 0.6.12 (git 5a298fa8, Release) [backends: cpu,cuda]
=== build info ===
  version       : 0.6.12
  git sha       : 5a298fa8
  git date      : 2026-06-05T09:11:41+00:00
  git subject   : docs.melotts.-_update_README_+_tts.md_with_BERT_companion_+_neural_G2P
  build date    : 2026-06-05T09:16:24Z
  build type    : Release
  compiler      : gcc 11.4.0
  os            : linux
  arch          : x86_64
  ggml backends : cpu,cuda

=== runtime env ===
  PATH                         = /opt/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin
  LD_LIBRARY_PATH              = /usr/lib64-nvidia
  LIBRARY_PATH                 = /usr/local/cuda/lib64/stubs
  CUDA_HOME                    = (unset)
  CUDA_PATH                    = (unset)
  CUDA_VISIBLE_DEVICES         = (unset)
  NVIDIA_VISIBLE_DEVICES       = all
  NVIDIA_DRIVER_CAPABILITIES   = compute,utility
  GGML_CUDA_DEBUG              = 1
  GGML_VK_VISIBLE_DEVICES      = (unset)
  GGML_VK_PIPELINE_CACHE_DEBUG = 1
  CRISPASR_USE_CUDA_COMPAT     = (unset)
  CRISPASR_BACKEND             = (unset)
  CRISPASR_CACHE_DIR           = (unset)

=== ld.so.conf.d (cuda-compat shadows host libcuda when present) ===
  /etc/ld.so.conf.d/
    987_cuda-13.conf
    000_cuda.conf
    988_cuda-12.conf

=== /usr/local/cuda/compat (forward-compat libcuda — opt-in only) ===

=== nvidia driver (host) ===
  --- /proc/driver/nvidia/version ---
  NVRM version: NVIDIA UNIX Open Kernel Module for x86_64  580.82.07  Release Build  (builder@28e54e79972f)  Thu Apr 30 18:50:30 UTC 2026
  GCC version:  Selected multilib: .;@m64

=== ggml backends + devices ===
[New Thread 0x15550b1ff000 (LWP 51519)]
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 14912 MiB):
  Device 0: Tesla T4, compute capability 7.5, VMM: yes, VRAM: 14912 MiB
  registered backends: 2
    [0] CUDA (devices: 1)
    [1] CPU (devices: 1)
  registered devices : 2
[New Thread 0x155509b4a000 (LWP 51520)]
    [0] gpu    name=CUDA0 desc=Tesla T4 mem=14807/14912 MiB id=0000:00:04.0
    [1] cpu    name=CPU desc=Intel(R) Xeon(R) CPU @ 2.00GHz mem=12975/12975 MiB id=?

crispasr[verbose]: model arg          = 'parakeet-tdt-0.6b-v3-q4_k.gguf'
crispasr[verbose]: backend arg        = 'parakeet'
crispasr[verbose]: use_gpu            = true
crispasr[verbose]: gpu_backend        = 'cuda'
crispasr[verbose]: gpu_device         = 0
crispasr[verbose]: cache_dir override = '(default)'
crispasr[verbose]: auto_download      = false
crispasr[verbose]: n_threads          = 2
crispasr[verbose]: flash_attn         = true
crispasr[verbose]: resolved model     = 'parakeet-tdt-0.6b-v3-q4_k.gguf'

Thread 1 "crispasr" received signal SIGSEGV, Segmentation fault.
0x00005555555af168 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) ()

Thread 3 (Thread 0x155509b4a000 (LWP 51520) "cuda-EvtHandlr"):
#0  0x0000155554532c4f in __GI___poll (fds=0x1554f8000c20, nfds=11, timeout=49) at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000015554831cb37 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#2  0x000015554840bd77 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#3  0x0000155548308883 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#4  0x00001555544aeac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488333968, -3690738972966642965, 23454979235840, 34, 23456230598608, 140737488334320, -1842588997115569429, -1842688164752254229}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00001555545408d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 2 (Thread 0x15550b1ff000 (LWP 51519) "cuda00001400006"):
#0  0x0000155554532c4f in __GI___poll (fds=0x555555f2bbc0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000015554831cb37 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#2  0x000015554840bd77 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#3  0x0000155548308883 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#4  0x00001555544aeac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488333456, -3690738972966642965, 23455003045888, 2, 23456230598608, 140737488333808, -1842594234291315989, -1842688164752254229}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00001555545408d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 1 (Thread 0x1555543b6000 (LWP 51512) "crispasr"):
#0  0x00005555555af168 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) ()
No symbol table info available.
#1  0x0000155554e0dccf in (anonymous namespace)::fill(CrispasrRegistryEntry&, (anonymous namespace)::Entry const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /content/CrispASR/build/src/libcrispasr.so.1
No symbol table info available.
#2  0x0000155554e0e381 in crispasr_registry_lookup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CrispasrRegistryEntry&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /content/CrispASR/build/src/libcrispasr.so.1
No symbol table info available.
#3  0x0000555555634c19 in crispasr_run_backend(whisper_params const&) ()
No symbol table info available.
#4  0x000055555558f88f in main ()
No symbol table info available.
```

Note the the **binaries run fine** when there the **backend** is **cpu**. 

We can try this on google colab 

Remove the CUDA 12
```
# Purge all pre-installed CUDA 12 packages
!apt-get --purge remove "*cuda*" "*cublas*" "*nsight*" -y
!apt-get autoremove --purge -y
!apt-get clean

# Remove leftover environment pathways
!rm -rf /usr/local/cuda*
```

Add CUDA 13 in google colab. 

```
# Download the network repository setup for Ubuntu (Colab uses Ubuntu)
!wget https://nvidia.com
!dpkg -i cuda-keyring_1.1-1_all.deb
!apt-get update

# Install the CUDA 13 toolkit
!apt-get install cuda-13-0 -y

```
I downloaded the already built ggufs from your hugging face account.

```
!wget -O parakeet-tdt-0.6b-v3-q4_k.gguf \
"https://huggingface.co/cstr/parakeet-tdt-0.6b-v3-GGUF/resolve/main/parakeet-tdt-0.6b-v3-q4_k.gguf"
```

Is CUDA 13 be supported in the future for linux?  And last time I checked CUDA 13 for windows crispasr.exe was working
and even utilizing the GPU. I will confirm again if you want for windows with logs. 

Thanks
I wish this project lots of success.  












Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda 13 Support for CrispASR #152

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cuda 13 Support for CrispASR #152

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions