Skip to content

Cuda 13 Support for CrispASR #152

@deepanshu-yadav

Description

@deepanshu-yadav

First of all @CrispStrobe awesome work. I was tired of using pip install blah blah for using speech to text and whisper.cpp provided limited application. Keep the good work.

I downloaded the built binaries for crispasr and tried to run

! ./new/crispasr-linux-x86_64-cuda/crispasr \
  --backend parakeet \
  -m parakeet-tdt-0.6b-v3-q4_k.gguf \
  -f TranscriptionOffline/noisy_audio/audio_1.wav

But i got

./new/crispasr-linux-x86_64-cuda/crispasr: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory

So now i decided to build it myself

cd CrispASR/ && cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON

cd CrispASR/ && cmake --build build -j$(nproc) --target crispasr-lib  # for only .so files
cd CrispASR/ && cmake --build build -j$(nproc)

It succeeded so i decided to test it out

It was failing when i executed on gpu so I dig a little deeper

gdb --batch \
  -ex "set pagination off" \
  -ex "run" \
  -ex "thread apply all bt full" \
  --args \
  ./CrispASR/build/bin/crispasr \
  --backend parakeet \
  --gpu-backend cuda \
  -m parakeet-tdt-0.6b-v3-q4_k.gguf \
  -f TranscriptionOffline/noisy_audio/audio_1.wav \
  -v

It showed the following output

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
crispasr 0.6.12 (git 5a298fa8, Release) [backends: cpu,cuda]
=== build info ===
  version       : 0.6.12
  git sha       : 5a298fa8
  git date      : 2026-06-05T09:11:41+00:00
  git subject   : docs.melotts.-_update_README_+_tts.md_with_BERT_companion_+_neural_G2P
  build date    : 2026-06-05T09:16:24Z
  build type    : Release
  compiler      : gcc 11.4.0
  os            : linux
  arch          : x86_64
  ggml backends : cpu,cuda

=== runtime env ===
  PATH                         = /opt/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin
  LD_LIBRARY_PATH              = /usr/lib64-nvidia
  LIBRARY_PATH                 = /usr/local/cuda/lib64/stubs
  CUDA_HOME                    = (unset)
  CUDA_PATH                    = (unset)
  CUDA_VISIBLE_DEVICES         = (unset)
  NVIDIA_VISIBLE_DEVICES       = all
  NVIDIA_DRIVER_CAPABILITIES   = compute,utility
  GGML_CUDA_DEBUG              = 1
  GGML_VK_VISIBLE_DEVICES      = (unset)
  GGML_VK_PIPELINE_CACHE_DEBUG = 1
  CRISPASR_USE_CUDA_COMPAT     = (unset)
  CRISPASR_BACKEND             = (unset)
  CRISPASR_CACHE_DIR           = (unset)

=== ld.so.conf.d (cuda-compat shadows host libcuda when present) ===
  /etc/ld.so.conf.d/
    987_cuda-13.conf
    000_cuda.conf
    988_cuda-12.conf

=== /usr/local/cuda/compat (forward-compat libcuda — opt-in only) ===

=== nvidia driver (host) ===
  --- /proc/driver/nvidia/version ---
  NVRM version: NVIDIA UNIX Open Kernel Module for x86_64  580.82.07  Release Build  (builder@28e54e79972f)  Thu Apr 30 18:50:30 UTC 2026
  GCC version:  Selected multilib: .;@m64

=== ggml backends + devices ===
[New Thread 0x15550b1ff000 (LWP 51519)]
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 14912 MiB):
  Device 0: Tesla T4, compute capability 7.5, VMM: yes, VRAM: 14912 MiB
  registered backends: 2
    [0] CUDA (devices: 1)
    [1] CPU (devices: 1)
  registered devices : 2
[New Thread 0x155509b4a000 (LWP 51520)]
    [0] gpu    name=CUDA0 desc=Tesla T4 mem=14807/14912 MiB id=0000:00:04.0
    [1] cpu    name=CPU desc=Intel(R) Xeon(R) CPU @ 2.00GHz mem=12975/12975 MiB id=?

crispasr[verbose]: model arg          = 'parakeet-tdt-0.6b-v3-q4_k.gguf'
crispasr[verbose]: backend arg        = 'parakeet'
crispasr[verbose]: use_gpu            = true
crispasr[verbose]: gpu_backend        = 'cuda'
crispasr[verbose]: gpu_device         = 0
crispasr[verbose]: cache_dir override = '(default)'
crispasr[verbose]: auto_download      = false
crispasr[verbose]: n_threads          = 2
crispasr[verbose]: flash_attn         = true
crispasr[verbose]: resolved model     = 'parakeet-tdt-0.6b-v3-q4_k.gguf'

Thread 1 "crispasr" received signal SIGSEGV, Segmentation fault.
0x00005555555af168 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) ()

Thread 3 (Thread 0x155509b4a000 (LWP 51520) "cuda-EvtHandlr"):
#0  0x0000155554532c4f in __GI___poll (fds=0x1554f8000c20, nfds=11, timeout=49) at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000015554831cb37 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#2  0x000015554840bd77 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#3  0x0000155548308883 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#4  0x00001555544aeac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488333968, -3690738972966642965, 23454979235840, 34, 23456230598608, 140737488334320, -1842588997115569429, -1842688164752254229}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00001555545408d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 2 (Thread 0x15550b1ff000 (LWP 51519) "cuda00001400006"):
#0  0x0000155554532c4f in __GI___poll (fds=0x555555f2bbc0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000015554831cb37 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#2  0x000015554840bd77 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#3  0x0000155548308883 in ?? () from /usr/lib64-nvidia/libcuda.so.1
No symbol table info available.
#4  0x00001555544aeac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488333456, -3690738972966642965, 23455003045888, 2, 23456230598608, 140737488333808, -1842594234291315989, -1842688164752254229}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00001555545408d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 1 (Thread 0x1555543b6000 (LWP 51512) "crispasr"):
#0  0x00005555555af168 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) ()
No symbol table info available.
#1  0x0000155554e0dccf in (anonymous namespace)::fill(CrispasrRegistryEntry&, (anonymous namespace)::Entry const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /content/CrispASR/build/src/libcrispasr.so.1
No symbol table info available.
#2  0x0000155554e0e381 in crispasr_registry_lookup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CrispasrRegistryEntry&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /content/CrispASR/build/src/libcrispasr.so.1
No symbol table info available.
#3  0x0000555555634c19 in crispasr_run_backend(whisper_params const&) ()
No symbol table info available.
#4  0x000055555558f88f in main ()
No symbol table info available.

Note the the binaries run fine when there the backend is cpu.

We can try this on google colab

Remove the CUDA 12

# Purge all pre-installed CUDA 12 packages
!apt-get --purge remove "*cuda*" "*cublas*" "*nsight*" -y
!apt-get autoremove --purge -y
!apt-get clean

# Remove leftover environment pathways
!rm -rf /usr/local/cuda*

Add CUDA 13 in google colab.

# Download the network repository setup for Ubuntu (Colab uses Ubuntu)
!wget https://nvidia.com
!dpkg -i cuda-keyring_1.1-1_all.deb
!apt-get update

# Install the CUDA 13 toolkit
!apt-get install cuda-13-0 -y

I downloaded the already built ggufs from your hugging face account.

!wget -O parakeet-tdt-0.6b-v3-q4_k.gguf \
"https://huggingface.co/cstr/parakeet-tdt-0.6b-v3-GGUF/resolve/main/parakeet-tdt-0.6b-v3-q4_k.gguf"

Is CUDA 13 be supported in the future for linux? And last time I checked CUDA 13 for windows crispasr.exe was working
and even utilizing the GPU. I will confirm again if you want for windows with logs.

Thanks
I wish this project lots of success.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions