4
00:00:19,313 --> 00:00:22,668
Because what we have seen in the video abo
5
00:00:22,669 --> 00:00:25,898
ut conversation threads is there is the ag
6
00:00:26,548 --> 00:00:29,128
ent thread uh that can keep track of the t
7
00:00:29,129 --> 00:00:32,358
his information and then you can serialize
8
00:00:32,359 --> 00:00:35,598
and deserialize that uh so you can store
9
00:00:35,599 --> 00:00:38,828
uh the various conversations you have, or
10
00:00:39,278 --> 00:00:41,678
you can do a self uh implemented way.
Date: 06/06/2026 10:55:09
SE: v5.0.0-rc3 - Microsoft Windows NT 10.0.19045.0 - 64-bit
C:\Aitool\SE5\CrispASR\crispasr.exe --backend parakeet -m "C:\Aitool\SE5\CrispASR\models\parakeet-tdt-0.6b-v3.gguf" -f "C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.wav" --output-srt -ck 250 --split-on-punct
Date: 06/06/2026 10:55:15
SE: v5.0.0-rc3 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Calling speech-to-text (Crisp ASR Parakeet) with : C:\Aitool\SE5\CrispASR\crispasr.exe --backend parakeet -m "C:\Aitool\SE5\CrispASR\models\parakeet-tdt-0.6b-v3.gguf" -f "C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.wav" --output-srt -ck 250 --split-on-punct
crispasr 0.6.12 (git 67ae33b, Release) [backends: cpu,cuda]
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16379 MiB):
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, VRAM: 16379 MiB
parakeet: vocab=8192 d_model=1024 n_layers=24 n_heads=8 ff=4096 pred=640 joint=640
parakeet: BN folded into conv_dw weights for 24 layers
crispasr: audio: 364800 samples (22.8 s) @ 16000 Hz, 4 threads
crispasr[lid]: using cached C:\Users\Administrator/.cache/crispasr/ggml-tiny.bin
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Administrator/.cache/crispasr/ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CUDA0 total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_backend_init_gpu: device 0: CUDA0 (type: 1)
whisper_backend_init_gpu: found GPU device 0: CUDA0 (type: 1, cnt: 0)
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size = 3.15 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 15.87 MB
whisper_init_state: compute buffer (encode) = 17.98 MB
whisper_init_state: compute buffer (cross) = 4.16 MB
whisper_init_state: compute buffer (decode) = 97.09 MB
crispasr[lid]: detected 'en' (p=0.865) via whisper
crispasr: LID -> language = 'en' (whisper, p=0.865)
crispasr: transcribed 22.8s audio in 4.29s (5.3x realtime)
[00:00:00.160 --> 00:00:03.390] Because what we have seen in the video abo
[00:00:03.390 --> 00:00:06.620] ut conversation threads is there is the ag
[00:00:06.620 --> 00:00:09.850] ent thread uh that can keep track of the t
[00:00:09.850 --> 00:00:13.080] his information and then you can serialize
[00:00:13.080 --> 00:00:16.320] and deserialize that uh so you can store
[00:00:16.320 --> 00:00:19.550] uh the various conversations you have, or
[00:00:19.550 --> 00:00:22.400] you can do a self uh implemented way.
Speech to text (Crisp ASR Parakeet) done in 00:00:05.4023280
Loading result from C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.srt
4
00:00:19,313 --> 00:00:22,668
Because what we have seen in the video abo
5
00:00:22,669 --> 00:00:25,898
ut conversation threads is there is the ag
6
00:00:26,548 --> 00:00:29,128
ent thread uh that can keep track of the t
7
00:00:29,129 --> 00:00:32,358
his information and then you can serialize
8
00:00:32,359 --> 00:00:35,598
and deserialize that uh so you can store
9
00:00:35,599 --> 00:00:38,828
uh the various conversations you have, or
10
00:00:39,278 --> 00:00:41,678
you can do a self uh implemented way.
Date: 06/06/2026 10:55:09
SE: v5.0.0-rc3 - Microsoft Windows NT 10.0.19045.0 - 64-bit
C:\Aitool\SE5\CrispASR\crispasr.exe --backend parakeet -m "C:\Aitool\SE5\CrispASR\models\parakeet-tdt-0.6b-v3.gguf" -f "C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.wav" --output-srt -ck 250 --split-on-punct
Date: 06/06/2026 10:55:15
SE: v5.0.0-rc3 - Microsoft Windows NT 10.0.19045.0 - 64-bit
Calling speech-to-text (Crisp ASR Parakeet) with : C:\Aitool\SE5\CrispASR\crispasr.exe --backend parakeet -m "C:\Aitool\SE5\CrispASR\models\parakeet-tdt-0.6b-v3.gguf" -f "C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.wav" --output-srt -ck 250 --split-on-punct
crispasr 0.6.12 (git 67ae33b, Release) [backends: cpu,cuda]
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16379 MiB):
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, VRAM: 16379 MiB
parakeet: vocab=8192 d_model=1024 n_layers=24 n_heads=8 ff=4096 pred=640 joint=640
parakeet: BN folded into conv_dw weights for 24 layers
crispasr: audio: 364800 samples (22.8 s) @ 16000 Hz, 4 threads
crispasr[lid]: using cached C:\Users\Administrator/.cache/crispasr/ggml-tiny.bin
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Administrator/.cache/crispasr/ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CUDA0 total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_backend_init_gpu: device 0: CUDA0 (type: 1)
whisper_backend_init_gpu: found GPU device 0: CUDA0 (type: 1, cnt: 0)
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size = 3.15 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 15.87 MB
whisper_init_state: compute buffer (encode) = 17.98 MB
whisper_init_state: compute buffer (cross) = 4.16 MB
whisper_init_state: compute buffer (decode) = 97.09 MB
crispasr[lid]: detected 'en' (p=0.865) via whisper
crispasr: LID -> language = 'en' (whisper, p=0.865)
crispasr: transcribed 22.8s audio in 4.29s (5.3x realtime)
[00:00:00.160 --> 00:00:03.390] Because what we have seen in the video abo
[00:00:03.390 --> 00:00:06.620] ut conversation threads is there is the ag
[00:00:06.620 --> 00:00:09.850] ent thread uh that can keep track of the t
[00:00:09.850 --> 00:00:13.080] his information and then you can serialize
[00:00:13.080 --> 00:00:16.320] and deserialize that uh so you can store
[00:00:16.320 --> 00:00:19.550] uh the various conversations you have, or
[00:00:19.550 --> 00:00:22.400] you can do a self uh implemented way.
Speech to text (Crisp ASR Parakeet) done in 00:00:05.4023280
Loading result from C:\Users\Administrator\AppData\Local\Temp\se_audioclip_6a797a20-ae14-4ad2-ae26-b9716e684e58.srt