Extending collective operations with application semantics for improving multi-cluster performance

被引:0
|
作者
Bongo, LA [1 ]
Anshus, O [1 ]
Bjorndalen, JM [1 ]
Larsen, T [1 ]
机构
[1] Univ Tromso, Dept Comp Sci, N-9001 Tromso, Norway
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We identify two ways of increasing the performance of allreduce-style of collective operations in a multi-cluster with large WAN latencies: (i) hiding latency in system noise, and (H) conditional allreduce where knowledge about the application is used to reduce the number of WAN messages. In our multicluster, system noise was not large enough to hide the WAN latency. But, the latency could be hidden using conditional-allreduce, since on many iterations only cluster-local values were needed, and many of the values needed from other clusters were prefetched. A speedup of 2.4 was achieved for a microbenchmark. Prefetching introduced a small overhead in the cluster with the slowest hosts.
引用
收藏
页码:320 / 327
页数:8
相关论文
共 47 条
  • [1] ClusterLink: A Multi-Cluster Application Interconnect
    Toledo, Kfir
    Kannan, Pravein G.
    Malka, M.
    Lev-Ran, E.
    Barabash, K.
    Bortnikov, V
    [J]. PROCEEDINGS OF THE 16TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, SYSTOR 2023, 2023, : 138 - 138
  • [2] Tuning application in a multi-cluster environment
    Argollo, Eduardo
    Gaudiani, Adriana
    Rexachs, Dolores
    Luque, Emilio
    [J]. EURO-PAR 2006 PARALLEL PROCESSING, 2006, 4128 : 78 - 88
  • [3] Improving the performance of collective operations in MPICH
    Thakur, R
    Gropp, WD
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2003, 2840 : 257 - 267
  • [4] Performance analysis of interconnection networks for multi-cluster systems
    Javadi, B
    Abawajy, JH
    Akbari, MK
    [J]. COMPUTATIONAL SCIENCE - ICCS 2005, PT 3, 2005, 3516 : 205 - 212
  • [5] A performance model for analysis of heterogeneous multi-cluster systems
    Javadi, Bahman
    Akbari, Mohammad K.
    Abawajy, Jemal H.
    [J]. PARALLEL COMPUTING, 2006, 32 (11-12) : 831 - 851
  • [6] Parallel Performance Prediction for Numerical Codes in a Multi-Cluster Environment
    Romanazzi, Giuseppe
    Jimack, Peter K.
    [J]. 2008 INTERNATIONAL MULTICONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (IMCSIT), VOLS 1 AND 2, 2008, : 427 - 434
  • [7] Multi-cluster computing interconnection network performance modeling and analysis
    Javadi, Bahman
    Akbari, Mohammad K.
    Abawajy, Jemal H.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (07): : 737 - 746
  • [8] Multi-cluster computing interconnection network performance modeling and analysis
    Javadi, Bahman
    Akbari, Mohammad K.
    Abawajy, Jemal H.
    Nahavandi, Sacid
    [J]. 2006 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, VOLS 1 AND 2, 2007, : 86 - +
  • [9] Application of clustering and multi-cluster selection in SoftMan's perception system
    Mi, Aizhong
    Zheng, Xuefeng
    Tu, Xuyan
    [J]. ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 134 - +
  • [10] Performance Analysis of Job Scheduling Algorithms on Hadoop Multi-cluster Environment
    Dhulavvagol, Praveen M.
    Totad, S. G.
    Sourabh, Shubham
    [J]. EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 457 - 470