some bug

4
00:00:19,313 --> 00:00:22,668
Because what we have seen in the video abo

5
00:00:22,669 --> 00:00:25,898
ut conversation threads is there is the ag

6
00:00:26,548 --> 00:00:29,128
ent thread uh that can keep track of the t

7
00:00:29,129 --> 00:00:32,358
his information and then you can serialize

8
00:00:32,359 --> 00:00:35,598
and deserialize that uh so you can store

9
00:00:35,599 --> 00:00:38,828
uh the various conversations you have, or

10
00:00:39,278 --> 00:00:41,678
you can do a self uh implemented way.

Date: 06/06/2026 10:55:09
SE: v5.0.0-rc3 - Microsoft Windows NT 10.0.19045.0 - 64-bit
C:\Aitool\SE5\CrispASR\crispasr.exe --backend parakeet -m "C:\Aitool\SE5\CrispASR\models\parakeet-tdt-0.6b-v3.gguf" -f "C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.wav" --output-srt -ck 250 --split-on-punct

-----------------------------------------------------------------------------
Date: 06/06/2026 10:55:15
SE: v5.0.0-rc3 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Calling speech-to-text (Crisp ASR Parakeet) with : C:\Aitool\SE5\CrispASR\crispasr.exe --backend parakeet -m "C:\Aitool\SE5\CrispASR\models\parakeet-tdt-0.6b-v3.gguf" -f "C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.wav" --output-srt -ck 250 --split-on-punct
crispasr 0.6.12 (git 67ae33b4, Release) [backends: cpu,cuda]
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16379 MiB):
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, VRAM: 16379 MiB
parakeet: vocab=8192  d_model=1024  n_layers=24  n_heads=8  ff=4096  pred=640  joint=640
parakeet: BN folded into conv_dw weights for 24 layers
crispasr: audio: 364800 samples (22.8 s) @ 16000 Hz, 4 threads
crispasr[lid]: using cached C:\Users\Administrator/.cache/crispasr/ggml-tiny.bin
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Administrator/.cache/crispasr/ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:        CUDA0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: device 0: CUDA0 (type: 1)
whisper_backend_init_gpu: found GPU device 0: CUDA0 (type: 1, cnt: 0)
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   15.87 MB
whisper_init_state: compute buffer (encode) =   17.98 MB
whisper_init_state: compute buffer (cross)  =    4.16 MB
whisper_init_state: compute buffer (decode) =   97.09 MB
crispasr[lid]: detected 'en' (p=0.865) via whisper
crispasr: LID -> language = 'en' (whisper, p=0.865)
crispasr: transcribed 22.8s audio in 4.29s (5.3x realtime)
[00:00:00.160 --> 00:00:03.390]  Because what we have seen in the video abo
[00:00:03.390 --> 00:00:06.620]  ut conversation threads is there is the ag
[00:00:06.620 --> 00:00:09.850]  ent thread uh that can keep track of the t
[00:00:09.850 --> 00:00:13.080]  his information and then you can serialize
[00:00:13.080 --> 00:00:16.320]  and deserialize that uh so you can store
[00:00:16.320 --> 00:00:19.550]  uh the various conversations you have, or
[00:00:19.550 --> 00:00:22.400]  you can do a self uh implemented way.
Speech to text (Crisp ASR Parakeet) done in 00:00:05.4023280
Loading result from C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.srt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some bug #150

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

some bug #150

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions