Dynamic Cache Contention Detection in Multi-threaded Applications

被引:26
|
作者
Zhao, Qin [1 ]
Koh, David [1 ]
Raza, Syed [1 ]
Bruening, Derek [2 ]
Wong, Weng-Fai [3 ]
Amarasinghe, Saman [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Google Inc, Mountain View, CA USA
[3] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
关键词
Performance; False Sharing; Cache Contention; Shadow Memory; Dynamic Instrumentation; DATA RACE DETECTOR;
D O I
10.1145/2007477.1952688
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach - a 5x slowdown on average relative to native execution - is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications by up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.
引用
收藏
页码:27 / 37
页数:11
相关论文
共 50 条
  • [21] Workshop on multi-threaded architectures and applications - MTAAP
    [J]. IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium, 2009,
  • [22] Classifying Performance Bottlenecks in Multi-Threaded Applications
    Dutta, Sourav
    Manakkadu, Sheheeda
    Kagaris, Dimitri
    [J]. 2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 341 - 345
  • [23] Cache-based bounds checking for multi-threaded C programs
    Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Oookayama, Meguro-ku, Tokyo 152-8552, Japan
    不详
    [J]. Proc. IASTED INt. Conf. Parall. Distrib. Comput. Syst., (386-393):
  • [24] An effective cache scheduling scheme for improving the performance in multi-threaded processors
    Lo, Shi-Wu
    Lam, Kam-Yiu
    Huang, Wen-Yan
    Qiu, Sheng-Feng
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2013, 59 (4-5) : 271 - 278
  • [25] LCP& TS: Logical Cache Partitioning and Thread Sharing Replacement Policy for Multi-Threaded Applications
    Kathavate, Sheela
    Rajesh, Lakshmi
    Srinath, N. K.
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [26] Multi-threaded reachability
    Sahoo, D
    Jain, J
    Iyer, SK
    Dill, DL
    Emerson, EA
    [J]. 42ND DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2005, 2005, : 467 - 470
  • [27] A Dynamic Logic for deductive verification of multi-threaded programs
    Beckert, Bernhard
    Klebanov, Vladimir
    [J]. FORMAL ASPECTS OF COMPUTING, 2013, 25 (03) : 405 - 437
  • [28] Multi-Threaded Parallel I/O for OpenMP Applications
    Kshitij Mehta
    Edgar Gabriel
    [J]. International Journal of Parallel Programming, 2015, 43 : 286 - 309
  • [29] Load balancing for network based multi-threaded applications
    Krone, O
    Raab, M
    Hirsbrunner, B
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 1998, 1497 : 206 - 214
  • [30] An efficient multi-threaded memory allocator for PDES applications
    Li, Tianlin
    Yao, Yiping
    Tang, Wenjie
    Zhu, Feng
    Lin, Zhongwei
    [J]. SIMULATION MODELLING PRACTICE AND THEORY, 2020, 100