Multi-GPU support: OOM error with dual RTX 3090

Hi, I'm testing the project on a dual-GPU setup (2× RTX 3090, 24GB VRAM each). However, I'm encountering an Out-Of-Memory (OOM) error during initialization. 

The logs show the backend attempting to allocate ~27.26 GB on `Device 0`, which exceeds its available VRAM (24.12 GB). 

Could you please clarify if multi-GPU support (e.g., tensor splitting or layer distribution across GPUs) is implemented? If so, how should I configure it to properly split the model across both cards? Any relevant config keys or CLI flags would be greatly appreciated.

**Error Log:**
```bash
[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=22 temp=1.00 chain_seed=1 fa_window=2048
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 48252 MiB):
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27260.57 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA0 buffer of size 28584775680
target load: ggml_backend_alloc_ctx_tensors failed (target)
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [157036]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU support: OOM error with dual RTX 3090 #42

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Multi-GPU support: OOM error with dual RTX 3090 #42

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions