Dynamic Cache Contention Detection in Multi-threaded Applications

被引:26
|
作者
Zhao, Qin [1 ]
Koh, David [1 ]
Raza, Syed [1 ]
Bruening, Derek [2 ]
Wong, Weng-Fai [3 ]
Amarasinghe, Saman [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Google Inc, Mountain View, CA USA
[3] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
关键词
Performance; False Sharing; Cache Contention; Shadow Memory; Dynamic Instrumentation; DATA RACE DETECTOR;
D O I
10.1145/2007477.1952688
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach - a 5x slowdown on average relative to native execution - is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications by up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.
引用
收藏
页码:27 / 37
页数:11
相关论文
共 50 条
  • [1] Decoupling contention with VRB mechanism for multi-threaded applications
    Gao, Ke
    Fan, Dongrui
    Liu, Zhiyong
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (11): : 2577 - 2588
  • [2] BUNDLE: Real-Time Multi-Threaded Scheduling to Reduce Cache Contention
    Tessler, Corey
    Fisher, Nathan
    [J]. PROCEEDINGS OF 2016 IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS), 2016, : 279 - 290
  • [3] Characterizing Multi-threaded Applications based on Shared-Resource Contention
    Dey, Tanima
    Wang, Wei
    Davidson, Jack W.
    Soffa, Mary Lou
    [J]. IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2011), 2011, : 76 - 86
  • [4] Cache Coherence Method for Improving Multi-threaded Applications on Multicore Systems
    Sun, Sun
    An, Hong
    Chen, Junshi
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON MULTIMEDIA, COMPUTER GRAPHICS AND BROADCASTING (MULGRAB), 2014, : 47 - 50
  • [5] POSTER ABSTRACT: Scheduling Multi-Threaded Tasks to Reduce Intra-Task Cache Contention
    Tessler, Corey
    Fisher, Nathan
    [J]. 2016 IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2016,
  • [6] Cache Prefetching and Speculation on Multi-Threaded Processors
    Ono, Tarik
    Greenstreet, Mark R.
    [J]. 2013 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2013, : 206 - 211
  • [7] Dynamic Partition of Shared Cache for Multi-Threaded Application in Multi-Core System
    Li, Shuo
    Wu, Feng
    [J]. ADVANCED MEASUREMENT AND TEST, PARTS 1 AND 2, 2010, 439-440 : 1587 - +
  • [8] Modeling and resolving lock contention for multi-threaded systems
    Zhang, Yang
    Talpur, Shahnawaz
    [J]. ICIC Express Letters, 2011, 5 (12): : 4473 - 4478
  • [9] Weighted dynamic shared cache partitioning mechanism for multi-threaded multi-programmed workloads
    College of Computer Science, National University of Defense Technology, Changsha 410073, China
    [J]. Jisuanji Xuebao, 2008, 11 (1938-1947):
  • [10] Fault Detection in Multi-Threaded C++ Server Applications
    Muehlenfeld, Arndt
    Wotawa, Franz
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2007, 174 (09) : 5 - 22