Last level cache (LLC) performance of data affining workloads on a CMP - A case study of parallel bioinformatics workloads

被引：17

作者：

Jaleel, Aamer ^{[1
]}

Mattina, Matthew ^{[2
]}

Jacob, Bruce ^{[3
]}

机构：

[1] Intel Corp, VSSAD, Santa Clara, CA 95051 USA

[2] Tilera Corp, San Jose, CA 95134 USA

[3] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

TWELFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS | 2006年

关键词：

D O I：

10.1109/HPCA.2006.1598115

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the continuing growth in the amount of genetic data, members of the bioinformatics community are developing a variety of data-mining applications to understand the data and discover meaningful information. These applications are important in defining the design and performance decisions of future high performance microprocessors. This paper presents a detailed data-sharing analysis and chip-multiprocessor (CAMP) cache study of several multi-threaded data-mining bioinformatics workloads. For a CMP with a three-level cache hierarchy, we model the last-level of the cache hierarchy as either multiple private caches or a single cache shared amongst different cores of the CA P. Our experiments show that the bioinformatics workloads exhibit significant data-sharing-50-95% of the data cache is shared by the different threads of the workload. Furthermore, regardless of the amount of data cache shared for some workloads, as many as 98% of the accesses to the last-level cache are to shared data cache lines. Additionally, the amount of data-sharing exhibited by the workloads is a function of the total cache size available-the larger the data cache the better the sharing behavior Thus partitioning the available last-level cache silicon area into multiple private caches can cause applications to lose their inherent data-sharing behavior For the workloads in this study, a shared 32AM last-level cache is able to capture a tremendous amount of data-sharing and outperform a 32AM private cache configuration by several orders of magnitude Specifically, with shared last-level caches the bandwidth demands beyond the last-level cache can he reduced by factors of 3-625 when compared to private last-level caches.

引用

页码：88 / +

页数：4

共 26 条

[21] Improving Performance of Hardware Accelerators by Optimizing Data Movement: A Bioinformatics Case Study
Knoben, Peter
Alachiotis, Nikolaos
ELECTRONICS, 2023, 12 (03)
[22] Flow Level Performance Analysis of Wireless Data Networks: A Case Study
Leino, Juha
Penttinen, Aleksi
Virtamo, Jorma
2006 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-12, 2006, : 961 - 966
[23] Thread- and Data-Level Parallel Simulation in SystemC, a Bitcoin Miner Case Study
Cheng, Zhongqi
Schmidt, Tim
Liu, Guantao
Domer, Rainer
2017 IEEE INTERNATIONAL HIGH LEVEL DESIGN VALIDATION AND TEST WORKSHOP (HLDVT), 2017, : 74 - 81
[24] Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-based Multicores Executing Parallel Data Analytics Applications
Ahmad, Masab
Dogan, Halit
Checconi, Fabio
Que, Xinyu
Buono, Daniele
Khan, Omer
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 316 - 325
[25] A case study with design of experiments: Performance evaluation methodology for Level 1 distributed data fusion processes
Sambhoos, Kedar
Bowman, Christopher
Llinas, James
INFORMATION FUSION, 2011, 12 (02) : 93 - 104
[26] Relative Performance of Frequentist and Bayesian Methods for Incorporating External Controls: A Case Study with Patient Level Data from the DapaHF Trial
Broglio, Kristine
Ran, Di
Zhang, Fanni
Henderson, Alasdair
Shahsavari, Sima
STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2025,

← 1 2 3 →