Locating Cache Performance Bottlenecks Using Data Profiling

被引:0
|
作者
Pesterev, Aleksey [1 ]
Zeldovich, Nickolai [1 ]
Morris, Robert T. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
Cache Misses; Data Profiling; Debug Registers; Statistical Profiling;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers attribute costs to specific code locations. The costs due to frequent cache misses on a given piece of data, however, may be spread over instructions throughout the application. The resulting individually small costs at a large number of instructions can easily appear insignificant in a code profiler's output. DProf helps programmers understand cache miss costs by attributing misses to data types instead of code. Associating cache misses with data helps programmers locate data structures that experience misses in many places in the application's code. DProf introduces a number of new views of cache miss data, including a data profile, which reports the data types with the most cache misses, and a data flow graph, which summarizes how objects of a given type are accessed throughout their lifetime, and which accesses incur expensive cross-CPU cache loads. We present two case studies of using DProf to find and fix cache performance bottlenecks in Linux. The improvements provide a 16-57% throughput improvement on a range of memcached and Apache workloads.
引用
收藏
页码:335 / 348
页数:14
相关论文
共 50 条
  • [1] Locating Performance Bottlenecks in Embedded Java']Java Software with Calling-Context Cross-Profiling
    Moret, Philippe
    Binder, Walter
    Villazon, Alex
    Ansaloni, Danilo
    Schoeberl, Martin
    SIXTH INTERNATIONAL CONFERENCE ON THE QUANTITATIVE EVALUATION OF SYSTEMS, PROCEEDINGS, 2009, : 107 - +
  • [2] Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model
    Stengel, Holger
    Treibig, Jan
    Hager, Georg
    Wellein, Gerhard
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 207 - 216
  • [3] Locating available bandwidth bottlenecks
    Ribeiro, VJ
    Riedi, RH
    Baraniuk, RG
    IEEE INTERNET COMPUTING, 2004, 8 (05) : 34 - 41
  • [4] Locating Throughput Bottlenecks in Home Networks
    Sundaresan, Srikanth
    Feamster, Nick
    Teixeira, Renata
    SIGCOMM'14: PROCEEDINGS OF THE 2014 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2014, : 351 - 352
  • [5] Locating Throughput Bottlenecks in Home Networks
    Sundaresan, Srikanth
    Feamster, Nick
    Teixeira, Renata
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2014, 44 (04) : 351 - 352
  • [6] Improving Data Cache Performance using Persistence Selective Caching
    Kumar, Sumeet S.
    van Leuken, Rene
    2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 1945 - 1948
  • [7] Boosting GPU Performance by Profiling-Based L1 Data Cache Bypassing
    Huangfu, Yijie
    Zhang, Wei
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1119 - 1122
  • [8] Analyzing Cache Performance Bottlenecks of STM Applications and addressing them with Compiler's help
    Mannarswamy, Sandya
    Govindarajan, R.
    PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, : 547 - 548
  • [9] Locating Internet bottlenecks: Algorithms, measurements, and implications
    Hu, NN
    Li, L
    Mao, ZQM
    Steenkiste, P
    Wang, J
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2004, 34 (04) : 41 - 54
  • [10] A profiling tool for detecting cache-critical data structures
    Tao, Jie
    Gaugler, Tobias
    Karl, Wolfgang
    EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 52 - +