Hi, I'm testing the project on a dual-GPU setup (2× RTX 3090, 24GB VRAM each). However, I'm encountering an Out-Of-Memory (OOM) error during initialization.
The logs show the backend attempting to allocate ~27.26 GB on Device 0, which exceeds its available VRAM (24.12 GB).
Could you please clarify if multi-GPU support (e.g., tensor splitting or layer distribution across GPUs) is implemented? If so, how should I configure it to properly split the model across both cards? Any relevant config keys or CLI flags would be greatly appreciated.
Error Log:
[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=22 temp=1.00 chain_seed=1 fa_window=2048
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 48252 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27260.57 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA0 buffer of size 28584775680
target load: ggml_backend_alloc_ctx_tensors failed (target)
^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [157036]
Hi, I'm testing the project on a dual-GPU setup (2× RTX 3090, 24GB VRAM each). However, I'm encountering an Out-Of-Memory (OOM) error during initialization.
The logs show the backend attempting to allocate ~27.26 GB on
Device 0, which exceeds its available VRAM (24.12 GB).Could you please clarify if multi-GPU support (e.g., tensor splitting or layer distribution across GPUs) is implemented? If so, how should I configure it to properly split the model across both cards? Any relevant config keys or CLI flags would be greatly appreciated.
Error Log:
[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=22 temp=1.00 chain_seed=1 fa_window=2048 ggml_cuda_init: found 2 CUDA devices (Total VRAM: 48252 MiB): Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27260.57 MiB on device 0: cudaMalloc failed: out of memory alloc_tensor_range: failed to allocate CUDA0 buffer of size 28584775680 target load: ggml_backend_alloc_ctx_tensors failed (target) ^CINFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. INFO: Finished server process [157036]