An Experimental Comparison of Cache-oblivious and Cache-conscious Programs

被引:0
|
作者
Yotov, Kamen [1 ]
Roeder, Tom [1 ]
Pingali, Keshav [1 ]
Gunnels, John [2 ]
Gustavson, Fred [2 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
[2] IBM Corp, TJ Watson Res Ctr, Armonk, NY 10504 USA
基金
美国国家科学基金会;
关键词
Memory hierarchy; Memory Latency; Memory bandwidth; Cache-oblivious algorithms; Cache-conscious algorithms; Numerical Software;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of optimizing applications to take advantage of the memory hierarchy of modern microprocessors. These algorithms are based on the divide-and-conquer paradigm - each division step creates sub-problems of smaller size, and when the working set of a sub-problem fits in some level of the memory hierarchy, the computations in that sub-problem can be executed without suffering capacity misses at that level. In this way, divide-and-conquer algorithms adapt automatically to all levels of the memory hierarchy; in fact, for problems like matrix multiplication, matrix transpose, and FFT, these recursive algorithms are optimal to within constant factors for some theoretical models of the memory hierarchy. An important question is the following: how well do carefully tuned cache-oblivious programs perform compared to carefully tuned cache-conscious programs for the same problem? Is there a price for obliviousness, and if so, how much performance do we lose? Somewhat surprisingly, there are few studies in the literature that have addressed this question. This paper reports the results of such a study in the domain of dense linear algebra. Our main finding is that in this domain; even highly optimized cache-oblivious programs perform significantly worse than corresponding cache-conscious programs. We provide insights into why this is so, and suggest research directions for making cache-oblivious algorithms more competitive.
引用
收藏
页码:93 / +
页数:2
相关论文
共 50 条
  • [31] Cache-oblivious planar shortest paths
    Jampala, H
    Zeh, N
    AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2005, 3580 : 563 - 575
  • [32] Cache-conscious structure definition
    Chilimbi, TM
    Davidson, B
    Larus, JR
    ACM SIGPLAN NOTICES, 1999, 34 (05) : 13 - 24
  • [33] Cache-conscious structure layout
    Chilimbi, Trishul M.
    Hill, Mark D.
    Larus, James R.
    Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999, : 1 - 12
  • [34] Optimal cache-oblivious implicit dictionaries
    Franceschini, G
    Grossi, R
    AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2003, 2719 : 316 - 331
  • [35] Cache-oblivious algorithms and data structures
    Brodal, GS
    ALGORITHM THEORY- SWAT 2004, 2004, 3111 : 3 - 13
  • [36] Optimal Cache-Oblivious Mesh Layouts
    Michael A. Bender
    Bradley C. Kuszmaul
    Shang-Hua Teng
    Kebin Wang
    Theory of Computing Systems, 2011, 48 : 269 - 296
  • [37] Cache-Oblivious R-Trees
    Lars Arge
    Mark de Berg
    Herman Haverkort
    Algorithmica, 2009, 53 : 50 - 68
  • [38] Cache-Oblivious Dynamic Programming for Bioinformatics
    Chowdhury, Rezaul Alam
    Le, Hai-Son
    Ramachandran, Vijaya
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) : 495 - 510
  • [39] Optimal Cache-Oblivious Mesh Layouts
    Bender, Michael A.
    Kuszmaul, Bradley C.
    Teng, Shang-Hua
    Wang, Kebin
    THEORY OF COMPUTING SYSTEMS, 2011, 48 (02) : 269 - 296
  • [40] Cache-Oblivious Peeling of Random Hypergraphs
    Belazzouguil, Djamal
    Boldi, Paolo
    Ottaviano, Giuseppe
    Venturini, Rossano
    Vigna, Sebastiano
    2014 DATA COMPRESSION CONFERENCE (DCC 2014), 2014, : 352 - 361