Last level cache (LLC) performance of data affining workloads on a CMP - A case study of parallel bioinformatics workloads

被引：17

作者：

Jaleel, Aamer ^{[1
]}

Mattina, Matthew ^{[2
]}

Jacob, Bruce ^{[3
]}

机构：

[1] Intel Corp, VSSAD, Santa Clara, CA 95051 USA

[2] Tilera Corp, San Jose, CA 95134 USA

[3] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

TWELFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS | 2006年

关键词：

D O I：

10.1109/HPCA.2006.1598115

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the continuing growth in the amount of genetic data, members of the bioinformatics community are developing a variety of data-mining applications to understand the data and discover meaningful information. These applications are important in defining the design and performance decisions of future high performance microprocessors. This paper presents a detailed data-sharing analysis and chip-multiprocessor (CAMP) cache study of several multi-threaded data-mining bioinformatics workloads. For a CMP with a three-level cache hierarchy, we model the last-level of the cache hierarchy as either multiple private caches or a single cache shared amongst different cores of the CA P. Our experiments show that the bioinformatics workloads exhibit significant data-sharing-50-95% of the data cache is shared by the different threads of the workload. Furthermore, regardless of the amount of data cache shared for some workloads, as many as 98% of the accesses to the last-level cache are to shared data cache lines. Additionally, the amount of data-sharing exhibited by the workloads is a function of the total cache size available-the larger the data cache the better the sharing behavior Thus partitioning the available last-level cache silicon area into multiple private caches can cause applications to lose their inherent data-sharing behavior For the workloads in this study, a shared 32AM last-level cache is able to capture a tremendous amount of data-sharing and outperform a 32AM private cache configuration by several orders of magnitude Specifically, with shared last-level caches the bandwidth demands beyond the last-level cache can he reduced by factors of 3-625 when compared to private last-level caches.

引用

页码：88 / +

页数：4

共 26 条

[1] Performance evaluation of a novel CMP cache structure for hybrid workloads
Zhao, Xuemei
Sammut, Karl
He, Fangpo
EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2007, : 89 - 96
[2] ROBUS: Fair Cache Allocation for Data-parallel Workloads
Kunjir, Mayuresh
Fain, Brandon
Munagala, Kamesh
Babu, Shivnath
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 219 - 234
[3] Characterizing the impact of last-level cache replacement policies on big-data workloads
Jamet, Alexandre Valentin
Alvarez, Lluc
Jimenez, Daniel A.
Casas, Marc
2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 134 - 144
[4] An architectural characterization study of data mining and bioinformatics workloads
Ozisikyilmaz, Berkin
Narayanan, Ramanathan
Zambreno, Joseph
Memik, Gokhan
Choudhary, Alok
PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2006, : 61 - +
[5] Analyzing Data Reference Characteristics of Deep Learning Workloads for Improving Buffer Cache Performance
Lee, Jeongha
Bahn, Hyokyung
APPLIED SCIENCES-BASEL, 2023, 13 (22):
[6] Scheduling Data Parallel Workloads - A Comparative Study of Two Common Algorithmic Approaches
Balasubramaniam, Mahadevan
Banicescu, Ioana
Ciorba, Florina M.
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 798 - 807
[7] Using Parallel Programming Models for Automotive Workloads on Heterogeneous Systems - a Case Study
Sommer, Lukas
Stock, Florian
Solis-Vasquez, Leonardo
Koch, Andreas
2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 17 - 21
[8] A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability
Uta, Alexandru
Obaseki, Harry
COMPANION OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 113 - 118
[9] Towards a Better Cache Utilization by Selective Data Storage for CMP Last Level Caches
Das, Shirshendu
Kapoor, Hemangee K.
2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, : 92 - 97
[10] Performance of commercial multimedia workloads on the Intel Pentium 4: A case study
Martinez, Christopher
Pinnamaneni, Mythri
John, Eugene B.
COMPUTERS & ELECTRICAL ENGINEERING, 2009, 35 (01) : 18 - 32

← 1 2 3 →