Design of a Scalable InfiniBand Topology Service to Enable Network-Topology-Aware Placement of Processes

被引:0
|
作者
Subramoni, H. [1 ]
Potluri, S. [1 ]
Kandalla, K. [1 ]
Barth, B. [3 ]
Vienne, J. [1 ]
Keasler, J. [4 ]
Tomko, K. [2 ]
Schulz, K. [3 ]
Moody, A. [4 ]
Panda, D. K. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio Supercomputing Ctr, Columbus, OH USA
[3] Texas Adv Comp Ctr, Austin, TX USA
[4] Lawrence Livermore Natl Lab, Lawrence, KS USA
基金
美国国家科学基金会;
关键词
PARALLEL; ALGORITHMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Over the last decade, InfiniBand has become an increasingly popular interconnect for deploying modern supercomputing systems. However, there exists no detection service that can discover the underlying network topology in a scalable manner and expose this information to runtime libraries and users of the high performance computing systems in a convenient way. In this paper, we design a novel and scalable method to detect the InfiniBand network topology by using Neighbor-Joining techniques (NJ). To the best of our knowledge, this is the first instance where the neighbor joining algorithm has been applied to solve the problem of detecting InfiniBand network topology. We also design a network-topology-aware MPI library that takes advantage of the network topology service. The library places processes taking part in the MPI job in a network-topology-aware manner with the dual aim of increasing intra-node communication and reducing the long distance inter-node communication across the InfiniBand fabric.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI
    Subramoni, Hari
    Vienne, Jerome
    Panda, Dhabaleswar K.
    [J]. EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 439 - 450
  • [2] Design of Network Topology Aware Scheduling Services for Large InfiniBand Clusters
    Subramoni, H.
    Bureddy, D.
    Kandalla, K.
    Schulz, K.
    Barth, B.
    Perkins, J.
    Arnold, M.
    Panda, D. K.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [3] Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters
    Subramoni, H.
    Kandalla, K.
    Vienne, J.
    Sur, S.
    Barth, B.
    Tomko, K.
    McLay, R.
    Schulz, K.
    Panda, D. K.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 317 - 325
  • [4] Scalable Diffusion-Aware Optimization of Network Topology
    Khalil, Elias Boutros
    Dilkina, Bistra
    Song, Le
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1226 - 1235
  • [5] Network Topology-aware Service Function Chaining in Software Defined Network
    Mihaeljans, M.
    Skrastins, A.
    [J]. 2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 1 - 4
  • [6] Topology-aware VM Placement for Network Optimization in Cloud Data Centers
    Lian, Zhen
    Li, Xin
    Qin, Xiaolin
    [J]. 2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 558 - 565
  • [7] Towards Network-topology aware Virtual Machine Placement in Cloud Datacenters
    Yuchi, Xuebiao
    Shetty, Sachin
    [J]. Proceedings 2016 IEEE World Congress on Services - SERVICES 2016, 2016, : 95 - 96
  • [8] A recursive scalable topology for network on chip
    Zhu, Xiao-Jing
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2011, 34 (05): : 924 - 930
  • [9] A Scalable Network Topology for Medical Imaging
    Arabnia, Hamid R.
    [J]. HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 4 - 4
  • [10] Joint Topology Design and Mapping of Service Function Chains for Efficient, Scalable, and Reliable Network Functions Virtualization
    Ye, Zilong
    Cao, Xiaojun
    Wang, Jianping
    Yu, Hongfang
    Qiao, Chunming
    [J]. IEEE NETWORK, 2016, 30 (03): : 81 - 87