Skip to content

[Vulkan] Resource-aware decode classification not used by CPU access paths #31

@goniz

Description

@goniz

Problem

Issue #15 (resource-aware decode synchronization) was closed as completed, implementing DecodeResourceClass classification and safe_to_skip_tail_barrier() logic in device.cpp. However, the CPU access paths in allocator.cpp don't leverage this classification:

// device.cpp: Resource classification exists (from #15)
enum class DecodeResourceClass : uint8_t {
  ReadOnlyWeight,
  AppendOnlyKVWrite,
  TokenScratch,
  PersistentOutput,
  Generic,
};

// Classification logic exists
static DecodeResourceClass classify_decode_resource(...) {
  if (is_scratch_array(stream, arr)) return DecodeResourceClass::TokenScratch;
  if (!output) return DecodeResourceClass::ReadOnlyWeight;
  // ...
}

// safe_to_skip_tail_barrier() exists
bool safe_to_skip_tail_barrier() const {
  return seen_resource && persistent_outputs == 0 && generic == 0;
}

But in allocator.cpp, raw_ptr() knows nothing about this:

void* Buffer::raw_ptr() {
  // ...
  if (buf->mapped_ptr != nullptr) {
    // Only uses buffer's last_semaphore - doesn't check if this is
    // a ReadOnlyWeight that could skip sync entirely
  }
  
  // Falls through to synchronize_all() without checking resource class
  vulkan::synchronize_all();  // Ignores resource classification!
}

The Gap

The resource classification identifies patterns like:

  • ReadOnlyWeight: Never written by GPU, CPU access needs no sync
  • TokenScratch: Write-only by GPU, CPU readback unlikely
  • AppendOnlyKVWrite: Special KV cache pattern

But raw_ptr() and the copy fallback paths don't check these classes - they just sync unconditionally.

Why This Matters

For example, reading a weight buffer (ReadOnlyWeight) that was uploaded once and never GPU-written should need zero synchronization. But raw_ptr() calls synchronize_all() anyway.

This wastes the work done in #15 to identify safe patterns.

Tasks

  • Expose resource classification to allocator/CPU access paths
  • Add resource class tracking to VulkanBuffer or external lookup
  • In raw_ptr(), skip sync for ReadOnlyWeight buffers with no GPU writes
  • In copy fallback, skip sync for buffers classified as safe
  • Add test: verify ReadOnlyWeight buffer access doesn't sync
  • Profile Qwen3 decode to measure sync reduction

Acceptance Criteria

  • CPU access to ReadOnlyWeight buffers doesn't trigger sync
  • Resource classification is consulted in CPU access decisions
  • safe_to_skip_tail_barrier() logic extended to CPU access paths
  • Qwen3 decode shows fewer syncs in trace output
  • No correctness regressions

Code References

  • mlx/backend/vulkan/device.cpp:439-483 (DecodeResourceClass from [Vulkan] Make decode synchronization resource-aware #15)
  • mlx/backend/vulkan/device.cpp:1551-1597 (classify_decode_resource)
  • mlx/backend/vulkan/allocator.cpp:41-71 (raw_ptr sync logic)
  • mlx/backend/vulkan/allocator.h:16-44 (VulkanBuffer struct)

Related

Note

This is a gap in closed issue #15, not a regression. The classification framework exists, but CPU access paths weren't integrated with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions