Implement context-length dependent KV-cache and Compute Buffer aware … by Nexesenex · Pull Request #335 · Nexesenex/croco.cpp

Nexesenex · 2025-07-04T15:01:42Z

…layer distribution for heterogeneous multi-GPU inference. Solves the problem of attemtping to run setups with different VRAM (e.g. 24GB cards with 6GB cards); previously layers were assigned without accounting for compute buffer, causing failure when one or more smaller GPUs could not hold the compute buffer.

Add requested_n_ctx parameter to llama_model_params
Implement 3-pass allocation algorithm accounting for compute buffers
Add device exclusion for insufficient memory (GPUs too small to allocate 1 layer + KV_cache + compute buffer excluded)
Add layer redistribution to make equitable use of included GPUs (may not be truly optimal)

…layer distribution for heterogeneous multi-GPU inference. Solves the problem of attemtping to run setups with different VRAM (e.g. 24GB cards with 6GB cards); previously layers were assigned without accounting for compute buffer, causing failure when one or more smaller GPUs could not hold the compute buffer. - Add requested_n_ctx parameter to llama_model_params - Implement 3-pass allocation algorithm accounting for compute buffers - Add device exclusion for insufficient memory (GPUs too small to allocate 1 layer + KV_cache + compute buffer excluded) - Add layer redistribution to make equitable use of included GPUs (may not be truly optimal)

Nexesenex merged commit ccb3e6c into Nexesenex:lcpp_pr_kv_aware_layer_distrib Jul 4, 2025
46 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement context-length dependent KV-cache and Compute Buffer aware …#335

Implement context-length dependent KV-cache and Compute Buffer aware …#335
Nexesenex merged 1 commit into
Nexesenex:lcpp_pr_kv_aware_layer_distribfrom
borebot:kv-compute-buffer-cache-aware-allocation

Nexesenex commented Jul 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Nexesenex commented Jul 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants