An Experimental Comparison of Cache-oblivious and Cache-conscious Programs

被引:0
|
作者
Yotov, Kamen [1 ]
Roeder, Tom [1 ]
Pingali, Keshav [1 ]
Gunnels, John [2 ]
Gustavson, Fred [2 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
[2] IBM Corp, TJ Watson Res Ctr, Armonk, NY 10504 USA
基金
美国国家科学基金会;
关键词
Memory hierarchy; Memory Latency; Memory bandwidth; Cache-oblivious algorithms; Cache-conscious algorithms; Numerical Software;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of optimizing applications to take advantage of the memory hierarchy of modern microprocessors. These algorithms are based on the divide-and-conquer paradigm - each division step creates sub-problems of smaller size, and when the working set of a sub-problem fits in some level of the memory hierarchy, the computations in that sub-problem can be executed without suffering capacity misses at that level. In this way, divide-and-conquer algorithms adapt automatically to all levels of the memory hierarchy; in fact, for problems like matrix multiplication, matrix transpose, and FFT, these recursive algorithms are optimal to within constant factors for some theoretical models of the memory hierarchy. An important question is the following: how well do carefully tuned cache-oblivious programs perform compared to carefully tuned cache-conscious programs for the same problem? Is there a price for obliviousness, and if so, how much performance do we lose? Somewhat surprisingly, there are few studies in the literature that have addressed this question. This paper reports the results of such a study in the domain of dense linear algebra. Our main finding is that in this domain; even highly optimized cache-oblivious programs perform significantly worse than corresponding cache-conscious programs. We provide insights into why this is so, and suggest research directions for making cache-oblivious algorithms more competitive.
引用
收藏
页码:93 / +
页数:2
相关论文
共 50 条
  • [21] Low Depth Cache-Oblivious Algorithms
    Blelloch, Guy E.
    Gibbons, Phillip B.
    Simhadri, Harsha Vardhan
    SPAA '10: PROCEEDINGS OF THE TWENTY-SECOND ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2010, : 189 - 199
  • [22] On the limits of cache-oblivious matrix transposition
    Silvestri, Francesco
    TRUSTWORTHY GLOBAL COMPUTING, 2007, 4661 : 233 - 243
  • [23] Cache-Conscious Wavefront Scheduling
    Rogers, Timothy G.
    O'Connor, Mike
    Aamodt, Tor M.
    2012 IEEE/ACM 45TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-45), 2012, : 72 - 83
  • [24] Cache-oblivious B-trees
    Bender, MA
    Demaine, ED
    Farach-Colton, M
    41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 399 - 409
  • [25] Cache-Oblivious Scheduling of Shared Workloads
    Bar, Arian
    Golab, Lukasz
    Ruehrup, Stefan
    Schiavone, Mirko
    Casas, Pedro
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 855 - 866
  • [26] Cache-oblivious databases: Limitations and opportunities
    He, Bingsheng
    Luo, Qiong
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (02):
  • [27] Cache-oblivious B-trees
    Bender, MA
    Demaine, ED
    Farach-Colton, M
    SIAM JOURNAL ON COMPUTING, 2005, 35 (02) : 341 - 358
  • [28] On the limits of cache-oblivious rational permutations
    Silvestri, Francesco
    THEORETICAL COMPUTER SCIENCE, 2008, 402 (2-3) : 221 - 233
  • [29] Cache-conscious structure layout
    Chilimbi, TM
    Hill, MD
    Larus, JR
    ACM SIGPLAN NOTICES, 1999, 34 (05) : 1 - 12
  • [30] Cache-Oblivious R-Trees
    Arge, Lars
    de Berg, Mark
    Haverkort, Herman
    ALGORITHMICA, 2009, 53 (01) : 50 - 68