- 2019 CUSR [Prefetch] Evaluation of Hardware Data Prefetchers on Server Processors
- 2016 CUSR [Prefetch] A Survey of Recent Prefetching Techniques for Processor Caches
- [CMP Cache] Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
- [Mapping, Replacement] The V-Way Cache: Demand Based Associativity via Global Replacement
- [CMP Cache] Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
- [Instruction Prefetch, CMP Cache] Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications
- [Replacement] A Case for MLP-Aware Cache Replacement
- [CMP Cache] Cooperative Caching for Chip Multiprocessors
- [Prefetch] Spatial Memory Streaming
- [Prefetch] Memory Prefetching Using Adaptive Stream Detection
- [CMP Cache] Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
- [CMP Cache] Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions
- [CMP Cache] ASR: Adaptive Selective Replication for CMP Caches
- [CMP Cache] Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
- [CMP Cache] Comparing memory systems for chip multiprocessors
- [Replacement] Adaptive insertion policies for high performance caching
- [Replacement] Scavenger: A New Last Level Cache Architecture with Global Block Priority
- [Prefetch] A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy
- [CMP Cache, Mapping] An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
- [Prefetch Enhancement] Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
- [Prefetch] Accelerating and Adapting Precomputation Threads for Effcient Prefetching
- [CMP Cache, Compression, Prefetch] Interactions Between Compression and Prefetching in Chip Multiprocessors
- [Cache Modeling] Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
- [Dead Block] Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
- [Dead Block] Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer
- [CMP Cache, Mapping] Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
- [CMP Cache] Multi-execution: multicore caching for data-similar executions
- [Replacement] PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
- [Cache Modeling] ECMon: exposing cache events for monitoring
- [Prefetch] Stream chaining: exploiting multiple levels of correlation in data prefetching
- [Prefetch] Spatio-temporal memory streaming
- [Replacement] Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
- [CMP Cache] Optimizing shared cache behavior of chip multiprocessors
- [CMP Cache] SHARP control: controlled shared cache management in chip multiprocessors
- [Mapping] Adaptive line placement with the set balancing cache
- [CMP Prefetch] Coordinated control of multiple prefetchers in multi-core systems
- [CMP Cache] Adaptive Spill-Receive for robust high-performance caching in CMPs
- [CMP Cache] Design and implementation of software-managed caches for multicores with local memory
- [Prefetch] Practical off-chip meta-data for temporal memory streaming
- [Cache Modeling] A first-order fine-grained multithreaded throughput model
- [CMP Prefetch] Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
- [Prefetch Enhancement] Feedback mechanisms for improving probabilistic memory prefetching
- [CMP Cache] Shared caches in multicores: the good, the bad, and the ugly
- [Replacement] High performance cache replacement using re-reference interval prediction (RRIP)
- [Inclusion Policy] Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies
- [Dead Block] Sampling Dead Block Prediction for Last-Level Caches
- [Mapping] The ZCache: Decoupling Ways and Associativity
- [Mapping] Vantage: scalable and efficient fine-grain cache partitioning
- [Replacement, Bypass] Bypass and insertion algorithms for exclusive last-level caches
- [CMP Prefetch] Prefetch-aware shared resource management for multi-core systems
- [Replacement] SHiP: signature-based hit predictor for high performance caching
- [Replacement, Prefetch] PACMan: prefetch-aware cache management for high performance caching
- [CMP Cache] CloudCache: Expanding and shrinking private caches
- [CPM Cache, Herarchy] MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy
- [Mapping] NUcache: An efficient multicore cache organization based on Next-Use distance
- [CMP Cache] ACCESS: Smart scheduling for asymmetric cache CMPs
- [Inclusion Policy] FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion
- [Replacement] Improving Cache Management Policies Using Dynamic Reuse Distances
- [Structure] Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
- [CMP Cache] Adaptive Set-Granular Cooperative Caching
- [Replacement] Decoupled dynamic cache segmentation
- [Cache Modeling] WEST: Cloning data cache behavior using Stochastic Traces
- [Replacement] Insertion and promotion for tree-based PseudoLRU last-level caches
- [Mapping] Imbalanced cache partitioning for balanced data-parallel programs
- [Prefetch] Linearizing irregular memory accesses for improved correlated prefetching
- [Instruction Prefetch] RDIP: return-address-stack directed instruction prefetching
- [Instruction Prefetch] SHIFT: shared history instruction fetch for lean-core server processors
- [Prefetch] RECAP: A region-based cure for the common cold (cache)
- [CMP Cache] Improving multi-core performance using mixed-cell cache architecture
- [Cache Modeling] Modeling performance variation due to cache sharing
- [Structure] Navigating the cache hierarchy with a single lookup
- [Mapping] Going vertical in memory management: Handling multiplicity by multi-policy
- [Mapping] Futility Scaling: High-Associativity Cache Partitioning
- [Prefetch] BuMP: Bulk Memory Access Prediction and Streaming
- [Prefetch] Loop-Aware Memory Prefetching Using Code Block Working Sets
- [CMP Cache] Locality-aware data replication in the Last-Level Cache
- [Mapping] Improving cache performance using read-write partitioning
- [Prefetch] Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers
- [Cache Modeling] The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory
- [Prefetch] Efficiently prefetching complex address patterns
- [Prefetch] Self-contained, accurate precomputation prefetching
- [Prefetch] IMP: indirect memory prefetcher
- [Prefetch] Talus: A simple way to remove cliffs in cache performance
- [Placement] Priority-based cache allocation in throughput processors
- [Inclusion Policy] High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches
- [Replacement] Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement
- [Inclusion Policy] LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches
- [Prefetch] pTask: A smart prefetching scheme for OS intensive applications
- [Prefetch] Path confidence based lookahead prefetching
- [Replacement] Minimal disturbance placement and promotion
- [Replacement, Cache Modeling] Modeling cache performance beyond LRU
- [Dead Block] RADAR: Runtime-assisted dead region management for last-level caches
- Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family
- [Prefetch] Best-offset hardware prefetching
- [Hierarchy] Jenga: Software-Defined Cache Hierarchies
- [Replacement] Maximizing Cache Performance Under Uncertainty
- [CMP Cache, Mapping] SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support
- [Cache Modeling] Fast and Accurate Exploration of Multi-level Caches Using Hierarchical Reuse Distance
- [Prefetch] Division of Labor: A More Effective Approach to Prefetching
- [Prefetch] Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies
- [Replacement, Prefetch] Rethinking Belady's Algorithm to Accommodate Prefetching
- [ICache Replacement] Exploring Predictive Replacement Policies for Instruction Cache and Branch Target Buffer
- [CMP Cache, Mapping] KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores
- SIPT: Speculatively Indexed, Physically Tagged Caches
- [Prefetch] Domino Temporal Data Prefetcher
- [Prefetch] Perceptron-based prefetch filtering
- [Prefetch] Efficient metadata management for irregular data prefetching
- [Replacement] Applying Deep Learning to the Cache Replacement Problem
- [Prefetch] DSPatch: Dual Spatial Pattern Prefetcher
- [Prefetch] Temporal Prefetching Without the Off-Chip Metadata
- [Prefetch] Bingo Spatial Data Prefetcher
- [Cache Modeling] Featherlight Reuse-Distance Measurement
- [Prefetcher] Bouquet of Instruction Pointers: Instruction Pointer Classifier-based Spatial Hardware Prefetching
- [Instruction Prefetch] I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing
- [Prefetch] RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher
- [Replacement, Profile Guided] Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications
- [Prefetch] A Cost-Effective Entangling Prefetcher for Instructions
- [AI Prefetch] Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
- [Replacement] Designing a Cost-Effective Cache Replacement Policy using Machine Learning
- [Replacement] Stream Floating: Enabling Proactive and Decentralized Cache Optimizations
- [Hierarchy] täk¯: a polymorphic cache hierarchy for general-purpose optimization of data movement
- [Prefetch] Register file prefetching
- [Prefetch] Page Size Aware Cache Prefetching
- [Prefetch] Berti: an Accurate Local-Delta Data Prefetcher
- [Prefetch] Merging Similar Patterns for Hardware Prefetching
- [Structure] Reducing Load Latency with Cache Level Prediction
- [Replacement] TCOR: A Tile Cache with Optimal Replacement