[VL] Support file cache spill in Gluten

### Description

Velox backend provides 2-level file cache (`AsyncDataCache` and `SsdCache`) and we have enabled it in [PR](https://github.com/apache/incubator-gluten/pull/1076/files), using a dedicated `MMapAllocator` initialized with configured capacity. This part of memory is not counted by execution memory or storage memory, and not managed by Spark `UnifiedMemoryManager`. In this ticket, we would like to fill this gap by following designs:

- Add `NativeStorageMemory` segment in vanilla `StorageMemory`. We will have a configuration `spark.memory.native.storageFraction` to define its size. Then we use this size `offheap.memory*spark.memory.storageFraction*spark.memory.native.storageFraction` to initialize `AsyncDataCache`.
- Add configuration `spark.memory.storage.preferSpillNative` to determine preference of spilling RDD cache or FileCache(Native) when storage memory should be shrinked. For example, when queries are mostly executed on same data sources, we prefer to keep native file cache.
- Introduce `NativeMemoryStore` to provide similar interfaces as vanilla `MemoryStore` and call `AsyncDataCache::shrink` when eviction needed.
- Introduce `NativeStorageMemoryAllocator` which is a memory allocator used for creating `AsyncDataCache`. It's wrapped with a `ReservationListener` to track the memory usage in native cache.
- `VeloxBackend` initialization will be done w/o cache created. We will do `VeloxBackend::setAsyncDatacache` when memory pools initializing.

The key code path will like following:
<img width="650" alt="image" src="https://github.com/apache/incubator-gluten/assets/11849056/57db03d4-cac9-4fe8-9e6d-91c7595fd139">




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Support file cache spill in Gluten #5884

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[VL] Support file cache spill in Gluten #5884

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions