Skip to content

Multi-GPU support: OOM error with dual RTX 3090 #42

@Ksdb104

Description

@Ksdb104

Hi, I'm testing the project on a dual-GPU setup (2× RTX 3090, 24GB VRAM each). However, I'm encountering an Out-Of-Memory (OOM) error during initialization.

The logs show the backend attempting to allocate ~27.26 GB on Device 0, which exceeds its available VRAM (24.12 GB).

Could you please clarify if multi-GPU support (e.g., tensor splitting or layer distribution across GPUs) is implemented? If so, how should I configure it to properly split the model across both cards? Any relevant config keys or CLI flags would be greatly appreciated.

Error Log:

[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=22 temp=1.00 chain_seed=1 fa_window=2048
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 48252 MiB):
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27260.57 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA0 buffer of size 28584775680
target load: ggml_backend_alloc_ctx_tensors failed (target)
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [157036]

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions