You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #15 (resource-aware decode synchronization) was closed as completed, implementing DecodeResourceClass classification and safe_to_skip_tail_barrier() logic in device.cpp. However, the CPU access paths in allocator.cpp don't leverage this classification:
But in allocator.cpp, raw_ptr() knows nothing about this:
void* Buffer::raw_ptr() {
// ...if (buf->mapped_ptr != nullptr) {
// Only uses buffer's last_semaphore - doesn't check if this is// a ReadOnlyWeight that could skip sync entirely
}
// Falls through to synchronize_all() without checking resource classvulkan::synchronize_all(); // Ignores resource classification!
}
The Gap
The resource classification identifies patterns like:
ReadOnlyWeight: Never written by GPU, CPU access needs no sync
TokenScratch: Write-only by GPU, CPU readback unlikely
AppendOnlyKVWrite: Special KV cache pattern
But raw_ptr() and the copy fallback paths don't check these classes - they just sync unconditionally.
Why This Matters
For example, reading a weight buffer (ReadOnlyWeight) that was uploaded once and never GPU-written should need zero synchronization. But raw_ptr() calls synchronize_all() anyway.
This wastes the work done in #15 to identify safe patterns.
Tasks
Expose resource classification to allocator/CPU access paths
Add resource class tracking to VulkanBuffer or external lookup
In raw_ptr(), skip sync for ReadOnlyWeight buffers with no GPU writes
In copy fallback, skip sync for buffers classified as safe
Problem
Issue #15 (resource-aware decode synchronization) was closed as completed, implementing
DecodeResourceClassclassification andsafe_to_skip_tail_barrier()logic indevice.cpp. However, the CPU access paths inallocator.cppdon't leverage this classification:But in allocator.cpp, raw_ptr() knows nothing about this:
The Gap
The resource classification identifies patterns like:
ReadOnlyWeight: Never written by GPU, CPU access needs no syncTokenScratch: Write-only by GPU, CPU readback unlikelyAppendOnlyKVWrite: Special KV cache patternBut
raw_ptr()and the copy fallback paths don't check these classes - they just sync unconditionally.Why This Matters
For example, reading a weight buffer (ReadOnlyWeight) that was uploaded once and never GPU-written should need zero synchronization. But
raw_ptr()callssynchronize_all()anyway.This wastes the work done in #15 to identify safe patterns.
Tasks
raw_ptr(), skip sync for ReadOnlyWeight buffers with no GPU writesAcceptance Criteria
safe_to_skip_tail_barrier()logic extended to CPU access pathsCode References
mlx/backend/vulkan/device.cpp:439-483(DecodeResourceClass from [Vulkan] Make decode synchronization resource-aware #15)mlx/backend/vulkan/device.cpp:1551-1597(classify_decode_resource)mlx/backend/vulkan/allocator.cpp:41-71(raw_ptr sync logic)mlx/backend/vulkan/allocator.h:16-44(VulkanBuffer struct)Related
Note
This is a gap in closed issue #15, not a regression. The classification framework exists, but CPU access paths weren't integrated with it.