Exploring the performance of massively multithreaded architectures

被引:5
|
作者
Bokhari, Shahid [1 ]
Saltz, Joel [2 ]
机构
[1] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[2] Emory Univ, Ctr Comprehens Informat, Atlanta, GA 30322 USA
来源
基金
美国国家科学基金会;
关键词
Cray MTA; Cray XMT; IBM x3755; itanium; multicore; multithreading; opteron; parallel computing; parallel algorithms; SGI Altix; shared memory;
D O I
10.1002/cpe.1484
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, C, plotted against both problem size and the number of processors. This scheme clearly shows if an implementation has achieved its asymptotic efficiency and is more general than (but includes) the commonly used speedup metric. It permits the discovery of any imperfections in both the software as well as the hardware, and is expected to permit a unified comparison of many different parallel architectures. Measurements on a number of well-known parallel algorithms, ranging from matrix multiply to quicksort, are presented for the MTA-2 and XMT and highlight some interesting differences between these machines. The performance of sequence alignment using dynamic programming is evaluated on the MTA-2, XMT, IBM x3755 and SGI Altix 350 and provides a useful comparison of the capabilities of the Cray machines with more conventional shared memory architectures. Copyright (c) 2009 John Wiley & Sons, Ltd.
引用
收藏
页码:588 / 616
页数:29
相关论文
共 50 条
  • [1] Graph coloring algorithms for multi-core and massively multithreaded architectures
    Catalyuerek, Uemit V.
    Feo, John
    Gebremedhin, Assefaw H.
    Halappanavar, Mahantesh
    Pothen, Alex
    [J]. PARALLEL COMPUTING, 2012, 38 (10-11) : 576 - 594
  • [2] Designing Next-Generation Massively Multithreaded Architectures for Irregular Applications
    Tumeo, Antonino
    Secchi, Simone
    Villa, Oreste
    [J]. COMPUTER, 2012, 45 (08) : 53 - 61
  • [3] Performance of shared caches on multithreaded architectures
    Chen, YY
    Peir, JK
    King, CT
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (02) : 499 - 514
  • [4] Performance pounds for distributed memory multithreaded architectures
    Zuberek, WM
    Govindarajan, R
    [J]. 1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 232 - 237
  • [5] Classification and performance evaluation of simultaneous multithreaded architectures
    Krishna, BH
    Govindarajan, R
    [J]. FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 34 - 39
  • [6] Analysis of performance limitations in multithreaded multiprocessor architectures
    Zuberek, WM
    [J]. SECOND INTERNATIONAL CONFERENCE ON APPLICATION OF CONCURRENCY TO SYSTEMS DESIGN, PROCEEDINGS, 2001, : 43 - 52
  • [7] Exploring cache performance in multithreaded processors
    Lioupis, D
    Milios, S
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1997, 20 (10) : 631 - 642
  • [8] Performance and hardware complexity tradeoffs in designing multithreaded architectures
    Bekerman, M
    Mendelson, A
    Sheaffer, G
    [J]. PROCEEDINGS OF THE 1996 CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT '96), 1996, : 24 - 34
  • [9] Evaluating the performance of CSB+-trees on multithreaded architectures
    Rashid, Layali K.
    Hassanein, Wessam M.
    [J]. 2007 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, 2007, : 1523 - 1526
  • [10] Latency tolerance: A metric for performance analysis of multithreaded architectures
    Nemawarkar, SS
    Gao, GR
    [J]. 11TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM, PROCEEDINGS, 1997, : 227 - 232