Exploring the performance of massively multithreaded architectures

被引：5

作者：

Bokhari, Shahid ^{[1
]}

Saltz, Joel ^{[2
]}

机构：

[1] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA

[2] Emory Univ, Ctr Comprehens Informat, Atlanta, GA 30322 USA

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2010年 / 22卷 / 05期

基金：

美国国家科学基金会;

关键词：

Cray MTA; Cray XMT; IBM x3755; itanium; multicore; multithreading; opteron; parallel computing; parallel algorithms; SGI Altix; shared memory;

D O I：

10.1002/cpe.1484

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, C, plotted against both problem size and the number of processors. This scheme clearly shows if an implementation has achieved its asymptotic efficiency and is more general than (but includes) the commonly used speedup metric. It permits the discovery of any imperfections in both the software as well as the hardware, and is expected to permit a unified comparison of many different parallel architectures. Measurements on a number of well-known parallel algorithms, ranging from matrix multiply to quicksort, are presented for the MTA-2 and XMT and highlight some interesting differences between these machines. The performance of sequence alignment using dynamic programming is evaluated on the MTA-2, XMT, IBM x3755 and SGI Altix 350 and provides a useful comparison of the capabilities of the Cray machines with more conventional shared memory architectures. Copyright (c) 2009 John Wiley & Sons, Ltd.

引用

页码：588 / 616

页数：29

共 50 条

[1] Graph coloring algorithms for multi-core and massively multithreaded architectures
Catalyuerek, Uemit V.
Feo, John
Gebremedhin, Assefaw H.
Halappanavar, Mahantesh
Pothen, Alex
[J]. PARALLEL COMPUTING, 2012, 38 (10-11) : 576 - 594
[2] Designing Next-Generation Massively Multithreaded Architectures for Irregular Applications
Tumeo, Antonino
Secchi, Simone
Villa, Oreste
[J]. COMPUTER, 2012, 45 (08) : 53 - 61
[3] Performance of shared caches on multithreaded architectures
Chen, YY
Peir, JK
King, CT
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (02) : 499 - 514
[4] Performance pounds for distributed memory multithreaded architectures
Zuberek, WM
Govindarajan, R
[J]. 1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 232 - 237
[5] Classification and performance evaluation of simultaneous multithreaded architectures
Krishna, BH
Govindarajan, R
[J]. FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 34 - 39
[6] Analysis of performance limitations in multithreaded multiprocessor architectures
Zuberek, WM
[J]. SECOND INTERNATIONAL CONFERENCE ON APPLICATION OF CONCURRENCY TO SYSTEMS DESIGN, PROCEEDINGS, 2001, : 43 - 52
[7] Exploring cache performance in multithreaded processors
Lioupis, D
Milios, S
[J]. MICROPROCESSORS AND MICROSYSTEMS, 1997, 20 (10) : 631 - 642
[8] Performance and hardware complexity tradeoffs in designing multithreaded architectures
Bekerman, M
Mendelson, A
Sheaffer, G
[J]. PROCEEDINGS OF THE 1996 CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT '96), 1996, : 24 - 34
[9] Evaluating the performance of CSB+-trees on multithreaded architectures
Rashid, Layali K.
Hassanein, Wessam M.
[J]. 2007 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, 2007, : 1523 - 1526
[10] Latency tolerance: A metric for performance analysis of multithreaded architectures
Nemawarkar, SS
Gao, GR
[J]. 11TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM, PROCEEDINGS, 1997, : 227 - 232

← 1 2 3 4 5 →