Dynamic Cache Contention Detection in Multi-threaded Applications

被引:26
|
作者
Zhao, Qin [1 ]
Koh, David [1 ]
Raza, Syed [1 ]
Bruening, Derek [2 ]
Wong, Weng-Fai [3 ]
Amarasinghe, Saman [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Google Inc, Mountain View, CA USA
[3] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
关键词
Performance; False Sharing; Cache Contention; Shadow Memory; Dynamic Instrumentation; DATA RACE DETECTOR;
D O I
10.1145/2007477.1952688
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach - a 5x slowdown on average relative to native execution - is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications by up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.
引用
收藏
页码:27 / 37
页数:11
相关论文
共 50 条
  • [41] Performance Evaluation of Virtualization Tools in Multi-Threaded Applications
    Sabolski, Ivan
    Leventic, Hrvoje
    Galic, Irena
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2014, 5 (02) : 57 - 62
  • [42] Multi-Threaded Parallel I/O for OpenMP Applications
    Mehta, Kshitij
    Gabriel, Edgar
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2015, 43 (02) : 286 - 309
  • [43] Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications
    Fezzardi, Pietro
    Ferrandi, Fabrizio
    [J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2020, 7 (04)
  • [44] Effective Resource Handling Data Splitting and Cache Implementation for Multi-Threaded Application
    Kumar, Rajeesh N., V
    Gayathri, G.
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [45] Characterizing Multi-threaded Applications for Designing Sharing-aware Last-level Cache Replacement Policies
    Natarajan, Ragavendra
    Chaudhuri, Mainak
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 1 - +
  • [46] Fault Detection in Multi-Threaded C plus plus Server Applications (Poster Abstract)
    Muehlenfeld, Arndt
    Wotawa, Franz
    [J]. PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, : 142 - 143
  • [47] An efficient multi-level trace toolkit for multi-threaded applications
    Danjean, V
    Namyst, R
    Wacrenier, PA
    [J]. EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 166 - 175
  • [48] Performance and energy metrics for multi-threaded applications on DVFS processors
    Rauber, Thomas
    Ruenger, Gudula
    Stachowski, Matthias
    [J]. SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 17 : 55 - 68
  • [49] A scalability prediction approach for multi-threaded applications on manycore processors
    Bai, Xiuxiu
    Wang, Endong
    Dong, Xiaoshe
    Zhang, Xingjun
    [J]. JOURNAL OF SUPERCOMPUTING, 2015, 71 (11): : 4072 - 4094
  • [50] Multi-Threaded Graph Partitioning
    LaSalle, Dominique
    Karypis, George
    [J]. IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 225 - 236