Change ov backend buffer is_host to false #34

wine99 · 2026-01-21T07:48:07Z

This PR changes the OpenVINO backend buffer is_host flag to false, aligning it with the CPU repack buffer behavior.

The OpenVINO backend buffer is conceptually similar to the CPU repack buffer. Both of them perform repacking (i.e., reordering) of quantized values. The OpenVINO backend goes further by performing additional processing such as extracting zero-points and related quantization metadata.

static struct ggml_backend_buffer_type ggml_backend_cpu_buffer_type_repack = {
    /* .iface    = */ {
                       /* .get_name         = */ ggml_backend_cpu_repack_buffer_type_get_name,
                       /* .alloc_buffer     = */ ggml_backend_cpu_repack_buffer_type_alloc_buffer,
                       /* .get_alignment    = */ ggml_backend_cpu_repack_buffer_type_get_alignment,
                       /* .get_max_size     = */ nullptr,  // defaults to SIZE_MAX
                       /* .get_alloc_size   = */ nullptr,  // defaults to ggml_nbytes
                       /* .is_host          = */ nullptr,
                       },
    /* .device  = */ ggml_backend_reg_dev_get(ggml_backend_cpu_reg(), 0),
    /* .context = */ new ggml::cpu::repack::extra_buffer_type(),
};

This change fixes the following issue.

In llama-model-loader.cpp, the loading logic is:

    if (use_mmap) {
        ....
            ggml_backend_tensor_set(cur, data, 0, n_size);
        ....
    } else {
        const auto & file = files.at(weight->idx);

        if (ggml_backend_buffer_is_host(cur->buffer)) {
            file->seek(weight->offs, SEEK_SET);
            file->read_raw(cur->data, n_size);
            ....
        } else {
            ....
                read_buf.resize(n_size);
                file->seek(weight->offs, SEEK_SET);
                file->read_raw(read_buf.data(), n_size);
                ggml_backend_tensor_set(cur, read_buf.data(), 0, n_size);
            ....
            }
        }
    }

llama.cpp now uses direct I/O by default instead of mmap when loading models. Quantized weight extraction in the OpenVINO backend is implemented in ggml_backend_tensor_set, so the code must take the non-host path. Therefore, ggml_backend_buffer_is_host should return false.

wine99 · 2026-01-21T07:50:17Z

@ynimmaga @cavusmustafa Feel free to merge this while I’m offline if it looks good to you.

wine99 · 2026-01-21T14:32:46Z

llama-bench -p 128 -n 32 is failing. It also fails on the dev_backend_openvino branch

| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 1B Q4_1                  | 785.75 MiB |     1.24 B | OPENVINO   |  99 |  1 |           pp128 |        991.94 ± 0.00 |
terminate called after throwing an instance of 'std::runtime_error'
  what():  ggml tensor extra is not of type TENSOR for input: cache_k_l0

cavusmustafa

LGTM

Change ov backend buffer is_host to false

9a15c8b

github-actions bot added the ggml label Jan 21, 2026

wine99 requested review from cavusmustafa and ynimmaga January 21, 2026 07:49

cavusmustafa approved these changes Jan 21, 2026

View reviewed changes

cavusmustafa merged commit be2d4b6 into dev_backend_openvino Jan 21, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change ov backend buffer is_host to false #34

Change ov backend buffer is_host to false #34

Uh oh!

wine99 commented Jan 21, 2026

Uh oh!

wine99 commented Jan 21, 2026

Uh oh!

wine99 commented Jan 21, 2026

Uh oh!

cavusmustafa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change ov backend buffer is_host to false #34

Change ov backend buffer is_host to false #34

Uh oh!

Conversation

wine99 commented Jan 21, 2026

Uh oh!

wine99 commented Jan 21, 2026

Uh oh!

wine99 commented Jan 21, 2026

Uh oh!

cavusmustafa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants