Last level cache (LLC) performance of data affining workloads on a CMP - A case study of parallel bioinformatics workloads

被引:17
|
作者
Jaleel, Aamer [1 ]
Mattina, Matthew [2 ]
Jacob, Bruce [3 ]
机构
[1] Intel Corp, VSSAD, Santa Clara, CA 95051 USA
[2] Tilera Corp, San Jose, CA 95134 USA
[3] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
D O I
10.1109/HPCA.2006.1598115
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the continuing growth in the amount of genetic data, members of the bioinformatics community are developing a variety of data-mining applications to understand the data and discover meaningful information. These applications are important in defining the design and performance decisions of future high performance microprocessors. This paper presents a detailed data-sharing analysis and chip-multiprocessor (CAMP) cache study of several multi-threaded data-mining bioinformatics workloads. For a CMP with a three-level cache hierarchy, we model the last-level of the cache hierarchy as either multiple private caches or a single cache shared amongst different cores of the CA P. Our experiments show that the bioinformatics workloads exhibit significant data-sharing-50-95% of the data cache is shared by the different threads of the workload. Furthermore, regardless of the amount of data cache shared for some workloads, as many as 98% of the accesses to the last-level cache are to shared data cache lines. Additionally, the amount of data-sharing exhibited by the workloads is a function of the total cache size available-the larger the data cache the better the sharing behavior Thus partitioning the available last-level cache silicon area into multiple private caches can cause applications to lose their inherent data-sharing behavior For the workloads in this study, a shared 32AM last-level cache is able to capture a tremendous amount of data-sharing and outperform a 32AM private cache configuration by several orders of magnitude Specifically, with shared last-level caches the bandwidth demands beyond the last-level cache can he reduced by factors of 3-625 when compared to private last-level caches.
引用
收藏
页码:88 / +
页数:4
相关论文
共 26 条
  • [1] Performance evaluation of a novel CMP cache structure for hybrid workloads
    Zhao, Xuemei
    Sammut, Karl
    He, Fangpo
    EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2007, : 89 - 96
  • [2] ROBUS: Fair Cache Allocation for Data-parallel Workloads
    Kunjir, Mayuresh
    Fain, Brandon
    Munagala, Kamesh
    Babu, Shivnath
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 219 - 234
  • [3] Characterizing the impact of last-level cache replacement policies on big-data workloads
    Jamet, Alexandre Valentin
    Alvarez, Lluc
    Jimenez, Daniel A.
    Casas, Marc
    2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 134 - 144
  • [4] An architectural characterization study of data mining and bioinformatics workloads
    Ozisikyilmaz, Berkin
    Narayanan, Ramanathan
    Zambreno, Joseph
    Memik, Gokhan
    Choudhary, Alok
    PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2006, : 61 - +
  • [5] Analyzing Data Reference Characteristics of Deep Learning Workloads for Improving Buffer Cache Performance
    Lee, Jeongha
    Bahn, Hyokyung
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [6] Scheduling Data Parallel Workloads - A Comparative Study of Two Common Algorithmic Approaches
    Balasubramaniam, Mahadevan
    Banicescu, Ioana
    Ciorba, Florina M.
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 798 - 807
  • [7] Using Parallel Programming Models for Automotive Workloads on Heterogeneous Systems - a Case Study
    Sommer, Lukas
    Stock, Florian
    Solis-Vasquez, Leonardo
    Koch, Andreas
    2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 17 - 21
  • [8] A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability
    Uta, Alexandru
    Obaseki, Harry
    COMPANION OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 113 - 118
  • [9] Towards a Better Cache Utilization by Selective Data Storage for CMP Last Level Caches
    Das, Shirshendu
    Kapoor, Hemangee K.
    2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, : 92 - 97
  • [10] Performance of commercial multimedia workloads on the Intel Pentium 4: A case study
    Martinez, Christopher
    Pinnamaneni, Mythri
    John, Eugene B.
    COMPUTERS & ELECTRICAL ENGINEERING, 2009, 35 (01) : 18 - 32