Description
Velox backend provides 2-level file cache (AsyncDataCache and SsdCache) and we have enabled it in PR, using a dedicated MMapAllocator initialized with configured capacity. This part of memory is not counted by execution memory or storage memory, and not managed by Spark UnifiedMemoryManager. In this ticket, we would like to fill this gap by following designs:
- Add
NativeStorageMemory segment in vanilla StorageMemory. We will have a configuration spark.memory.native.storageFraction to define its size. Then we use this size offheap.memory*spark.memory.storageFraction*spark.memory.native.storageFraction to initialize AsyncDataCache.
- Add configuration
spark.memory.storage.preferSpillNative to determine preference of spilling RDD cache or FileCache(Native) when storage memory should be shrinked. For example, when queries are mostly executed on same data sources, we prefer to keep native file cache.
- Introduce
NativeMemoryStore to provide similar interfaces as vanilla MemoryStore and call AsyncDataCache::shrink when eviction needed.
- Introduce
NativeStorageMemoryAllocator which is a memory allocator used for creating AsyncDataCache. It's wrapped with a ReservationListener to track the memory usage in native cache.
VeloxBackend initialization will be done w/o cache created. We will do VeloxBackend::setAsyncDatacache when memory pools initializing.
The key code path will like following:

Description
Velox backend provides 2-level file cache (
AsyncDataCacheandSsdCache) and we have enabled it in PR, using a dedicatedMMapAllocatorinitialized with configured capacity. This part of memory is not counted by execution memory or storage memory, and not managed by SparkUnifiedMemoryManager. In this ticket, we would like to fill this gap by following designs:NativeStorageMemorysegment in vanillaStorageMemory. We will have a configurationspark.memory.native.storageFractionto define its size. Then we use this sizeoffheap.memory*spark.memory.storageFraction*spark.memory.native.storageFractionto initializeAsyncDataCache.spark.memory.storage.preferSpillNativeto determine preference of spilling RDD cache or FileCache(Native) when storage memory should be shrinked. For example, when queries are mostly executed on same data sources, we prefer to keep native file cache.NativeMemoryStoreto provide similar interfaces as vanillaMemoryStoreand callAsyncDataCache::shrinkwhen eviction needed.NativeStorageMemoryAllocatorwhich is a memory allocator used for creatingAsyncDataCache. It's wrapped with aReservationListenerto track the memory usage in native cache.VeloxBackendinitialization will be done w/o cache created. We will doVeloxBackend::setAsyncDatacachewhen memory pools initializing.The key code path will like following:
