Last level cache (LLC) performance of data affining workloads on a CMP - A case study of parallel bioinformatics workloads

被引:17
|
作者
Jaleel, Aamer [1 ]
Mattina, Matthew [2 ]
Jacob, Bruce [3 ]
机构
[1] Intel Corp, VSSAD, Santa Clara, CA 95051 USA
[2] Tilera Corp, San Jose, CA 95134 USA
[3] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
D O I
10.1109/HPCA.2006.1598115
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the continuing growth in the amount of genetic data, members of the bioinformatics community are developing a variety of data-mining applications to understand the data and discover meaningful information. These applications are important in defining the design and performance decisions of future high performance microprocessors. This paper presents a detailed data-sharing analysis and chip-multiprocessor (CAMP) cache study of several multi-threaded data-mining bioinformatics workloads. For a CMP with a three-level cache hierarchy, we model the last-level of the cache hierarchy as either multiple private caches or a single cache shared amongst different cores of the CA P. Our experiments show that the bioinformatics workloads exhibit significant data-sharing-50-95% of the data cache is shared by the different threads of the workload. Furthermore, regardless of the amount of data cache shared for some workloads, as many as 98% of the accesses to the last-level cache are to shared data cache lines. Additionally, the amount of data-sharing exhibited by the workloads is a function of the total cache size available-the larger the data cache the better the sharing behavior Thus partitioning the available last-level cache silicon area into multiple private caches can cause applications to lose their inherent data-sharing behavior For the workloads in this study, a shared 32AM last-level cache is able to capture a tremendous amount of data-sharing and outperform a 32AM private cache configuration by several orders of magnitude Specifically, with shared last-level caches the bandwidth demands beyond the last-level cache can he reduced by factors of 3-625 when compared to private last-level caches.
引用
收藏
页码:88 / +
页数:4
相关论文
共 26 条
  • [21] Improving Performance of Hardware Accelerators by Optimizing Data Movement: A Bioinformatics Case Study
    Knoben, Peter
    Alachiotis, Nikolaos
    ELECTRONICS, 2023, 12 (03)
  • [22] Flow Level Performance Analysis of Wireless Data Networks: A Case Study
    Leino, Juha
    Penttinen, Aleksi
    Virtamo, Jorma
    2006 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-12, 2006, : 961 - 966
  • [23] Thread- and Data-Level Parallel Simulation in SystemC, a Bitcoin Miner Case Study
    Cheng, Zhongqi
    Schmidt, Tim
    Liu, Guantao
    Domer, Rainer
    2017 IEEE INTERNATIONAL HIGH LEVEL DESIGN VALIDATION AND TEST WORKSHOP (HLDVT), 2017, : 74 - 81
  • [24] Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-based Multicores Executing Parallel Data Analytics Applications
    Ahmad, Masab
    Dogan, Halit
    Checconi, Fabio
    Que, Xinyu
    Buono, Daniele
    Khan, Omer
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 316 - 325
  • [25] A case study with design of experiments: Performance evaluation methodology for Level 1 distributed data fusion processes
    Sambhoos, Kedar
    Bowman, Christopher
    Llinas, James
    INFORMATION FUSION, 2011, 12 (02) : 93 - 104
  • [26] Relative Performance of Frequentist and Bayesian Methods for Incorporating External Controls: A Case Study with Patient Level Data from the DapaHF Trial
    Broglio, Kristine
    Ran, Di
    Zhang, Fanni
    Henderson, Alasdair
    Shahsavari, Sima
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2025,