Bandwidth based performance optimization of Multi-threaded applications

被引:2
|
作者
Manakkadu, Sheheeda [1 ]
Dutta, Sourav [1 ]
机构
[1] So Illinois Univ, Dept Elect & Comp Engn, Carbondale, IL 62901 USA
关键词
D O I
10.1109/PAAP.2014.51
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multiple threads running on a multi-core processor can improve the performance of a parallel application significantly. However, effective scaling of threads and cores plays a key role to achieve optimal performance because performance does not necessarily improve with increasing number of cores. Multi-threaded applications suffer due to thread synchronization, negative interference in shared memory including last level cache and main memory. Memory bandwidth also often limits the performance of a multi-threaded workload. In this paper we propose a method to achieve optimal scalability on multi-core platform and predict the bandwidth requirement of parallel workloads for a given number of threads. We employ the proposed method to improve the performance of bandwidth limited parallel applications. We find that DRAM access has various phases and use the highest bandwidth among all phases to predict the performance of a given workload on multi-threaded environment. We evaluate our proposed method using Gem5 multi-core simulator and the experimental results show that the phase based bandwidth utilization method can estimate the optimal number of threads for a given parallel workload and has low prediction error.
引用
收藏
页码:118 / 122
页数:5
相关论文
共 50 条
  • [41] Modeling multi-threaded architectures in PAMELA for real-time high performance applications
    Balakrishnan, S
    Nandy, SK
    vanGemund, AJC
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 407 - 414
  • [42] An efficient multi-level trace toolkit for multi-threaded applications
    Danjean, V
    Namyst, R
    Wacrenier, PA
    EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 166 - 175
  • [43] Improving MPI Multi-threaded RMA Communication Performance
    Hjelm, Nathan
    Dosanjh, Matthew G. F.
    Grant, Ryan E.
    Groves, Taylor
    Bridges, Patrick
    Arnold, Dorian
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [44] A replay system for performance analysis of multi-threaded programs
    Key Laboratory of Services Computing Technology and System, Ministry of Education, Huazhong University of Science and Technology, Wuhan
    430074, China
    Jisuanji Yanjiu yu Fazhan, 1 (45-55):
  • [45] Predicting the Memory Bandwidth and Optimal Core Allocations for Multi-threaded Applications on Large-scale NUMA Machines
    Wang, Wei
    Davidson, Jack W.
    Soffa, Mary Lou
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA-22), 2016, : 419 - 431
  • [46] Matlab enhanced multi-threaded tomography optimization sequence (MEMTOS)
    Lum, Edward S.
    Pope, Chad L.
    ANNALS OF NUCLEAR ENERGY, 2016, 91 : 127 - 134
  • [47] A scalability prediction approach for multi-threaded applications on manycore processors
    Bai, Xiuxiu
    Wang, Endong
    Dong, Xiaoshe
    Zhang, Xingjun
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (11): : 4072 - 4094
  • [48] Fault Detection in Multi-Threaded C++ Server Applications
    Muehlenfeld, Arndt
    Wotawa, Franz
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2007, 174 (09) : 5 - 22
  • [49] Multi-Threaded Graph Partitioning
    LaSalle, Dominique
    Karypis, George
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 225 - 236
  • [50] Clustering the heap in multi-threaded applications for improved garbage collection
    Cohen, Myra
    Kooi, Shiu Beng
    Srisa-An, Witawas
    GECCO 2006: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2006, : 1901 - +