Optimizing UPC programs for multi-core systems

被引:0
|
作者
Zheng, Yili [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, Berkeley, CA 94720 USA
关键词
UPC; PGAS;
D O I
10.1155/2010/646829
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems to get good performance. First, we describe several UPC program optimization techniques that are important to achieving good performance on NUMA multi-core computers with examples and quantitative performance results. Second, we use two numerical computing kernels, parallel matrix-matrix multiplication and parallel 3-D FFT, to demonstrate the end-to-end development and optimization for UPC applications. Our results show that the optimized UPC programs achieve very good and scalable performance on current multi-core systems and can even outperform vendor-optimized libraries in some cases.
引用
收藏
页码:183 / 191
页数:9
相关论文
共 50 条
  • [21] A learning Portfolio solver for optimizing the performance of constraint programming problems on multi-core computing systems
    Menouer, Tarek
    Sukhija, Nitin
    Le Cun, Bertrand
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (04):
  • [22] Optimizing Tasks Assignment on Heterogeneous Multi-core Real-time Systems with Minimum Energy
    Li, Ying
    Niu, Jianwei
    Qiu, Meikang
    Long, Xiang
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 577 - 582
  • [23] Optimizing convolutional neural networks on multi-core vector accelerator
    Liu, Zhong
    Xiao, Xin
    Li, Chen
    Ma, Sheng
    Rangyu, Deng
    PARALLEL COMPUTING, 2022, 112
  • [24] Optimizing General Matrix Multiplications on Modern Multi-core DSPs
    Yu, Kainan
    Qi, Xinxin
    Zhang, Peng
    Fang, Jianbin
    Dong, Dezun
    Wang, Ruibo
    Tang, Tao
    Huang, Chun
    Che, Yonggang
    Wang, Zheng
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 964 - 975
  • [25] Optimizing the parallel adaptive indexing algorithm on multi-core CPUs
    Yuan T.
    Liu Z.
    Liu H.
    1600, Science Press (43): : 57 - 62
  • [26] An Efficient Implementation of PSRS for Multi-core Systems
    He Songsong
    Gu Naijie
    Weng Yuping
    Ning Lanfang
    2011 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL (ICECC), 2011, : 136 - 139
  • [27] Performance Evaluation of LAMMPS on Multi-core Systems
    Cha, Kwangho
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 812 - 819
  • [28] Polytopol Computing for Multi-Core and Distributed Systems
    Spaanenburg, Henk
    Spaanenburg, Lambert
    Ranefors, Johan
    VLSI CIRCUITS AND SYSTEMS IV, 2009, 7363
  • [29] Estimation of thermal status in multi-core systems
    Corbetta, Simone
    Fornaciari, William
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1660 - 1663
  • [30] Dynamic Scheduling of Stream Programs on Embedded Multi-core Processors
    Lee, Haeseung
    Che, Weijia
    Chatha, Karam S.
    CODES+ISSS'12:PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE-CODESIGN AND SYSTEM SYNTHESIS, 2012, : 93 - 102