共 18 条
- [1] Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 207 - 216
- [2] PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, 2023, : 167 - 179
- [3] Analytic performance model for parallel overlapping memory-bound kernels CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
- [4] Scalable Kernel Fusion for Memory-Bound GPU Applications SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 191 - 202
- [5] Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012, 2013, 7851 : 72 - 79
- [6] A practical performance model for compute and memory bound GPU kernels 23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 651 - 658
- [7] Harvesting Memory-bound CPU Stall Cycles in Software with MSH PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 57 - 75
- [8] Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 718 - 730
- [9] Accelerating the Unacceleratable: Hybrid CPU/GPU Algorithms for Memory-Bound Database Primitives 15TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2019), 2019,
- [10] Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs 2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 377 - 384