Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM

被引:2
|
作者
Kousha, P. [1 ]
Raj, Kamal S. D. [1 ]
Subramoni, H. [1 ]
Panda, D. K. [1 ]
Na, H. [2 ]
Dockendorf, T. [2 ]
Tomko, K. [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Ohio Supercomp Ctr, Columbus, OH USA
关键词
InfiniBand; Network Monitoring; Profiling; Fabric; Interconnect;
D O I
10.1145/3311790.3396672
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Designing a scalable real-time monitoring and profiling tool with low overhead for network analysis and introspection capable of capturing all relevant network events is a challenging task. Newer set of challenges come out as HPC systems are becoming larger and users are expecting to have better capabilities like real-time profiling at fine granularity. We take up this challenge by redesigning OSU INAM and making it capable to gather, store, retrieve, visualize, and analyze network metrics for large and complex HPC clusters. The enhanced OSU INAM tool provides scalability, low overhead and fined-granularity InfiniBand port counter inquiry and fabric discovery for HPC users, system administrators, and HPC developers. Our experiments show that, for a cluster of 1,428 nodes and 114 switches, the proposed design can gather fabric metrics at very fine (sub-second) granularity and discovers the complete network topology in approximately 5 minutes. The proposed design has been released publicly as a part of OSU INAM Tool and is available for free download and use from the project website.
引用
收藏
页码:215 / 223
页数:9
相关论文
共 50 条
  • [1] Real-time behaviour profiling for network monitoring
    Xu, Kuai
    Wang, Feng
    Bhattacharyya, Supratik
    Zhang, Zhi-Li
    [J]. INTERNATIONAL JOURNAL OF INTERNET PROTOCOL TECHNOLOGY, 2010, 5 (1-2) : 65 - 80
  • [2] Full-scale bridge expansion joint monitoring using a real-time wireless network
    Fils, Pierredens
    Jang, Shinae
    Ren, Daisy
    Wang, Jiachen
    Han, Song
    Malla, Ramesh
    [J]. STRUCTURAL MONITORING AND MAINTENANCE, AN INTERNATIONAL JOURNAL, 2022, 9 (04): : 359 - 371
  • [3] A real-time network traffic profiling system
    Xu, Kuai
    Wang, Feng
    Bhattacharyya, Supratik
    Zhang, Zhi-Li
    [J]. 37TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2007, : 595 - +
  • [4] Real-time multicast network monitoring
    Kim, Joohee
    Kim, Bongki
    Yoo, Jaehyoung
    [J]. MANAGING NEXT GENERATION NETWORKS AND SERVICES, PROCEEDINGS, 2007, 4773 : 185 - +
  • [5] Real-time monitoring and chemical profiling of a cultivation process
    Mortensen, Peter P.
    Bro, Rasmus
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 84 (1-2) : 106 - 113
  • [6] Real-time Urban Population Monitoring Using Pervasive Sensor Network
    Thakur, Gautam S.
    Kuruganti, Teja
    Bobrek, Miljko
    Killough, Stephen
    Nutaro, James
    Liu, Cheng
    Lu, Wei
    [J]. 24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
  • [7] A Real-time Monitoring and Statistic System Using Hierarchical Sensor Network
    Sun, Ning
    Han, Guangjie
    Lin, Chen
    Lu, He
    [J]. 2015 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE THEORY, SYSTEMS AND APPLICATIONS (CCITSA 2015), 2015, : 136 - 141
  • [8] MOIR/MT: Monitoring Large-Scale Road Network Traffic in Real-Time
    Liu, Kuien
    Deng, Ke
    Ding, Zhiming
    Li, Mingshu
    Zhou, Xiaofang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1538 - 1541
  • [9] Near Real-Time Vegetation Monitoring at Global Scale
    Verger, Aleixandre
    Baret, Frederic
    Weiss, Marie
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (08) : 3473 - 3481
  • [10] Beijing Time and Frequency Network and Derived Real-Time Time Scale
    Guo, Y. C.
    Wang, B.
    Wang, F. M.
    Si, H. W.
    Zuo, Y. N.
    Wang, L. J.
    [J]. PROCEEDINGS OF THE 2019 JOINT CONFERENCE OF THE IEEE INTERNATIONAL FREQUENCY CONTROL SYMPOSIUM AND EUROPEAN FREQUENCY AND TIME FORUM (EFTF-IFCS 2019), 2019,