Multiclass classification of distributed memory parallel computations

被引:7
|
作者
Whalen, Sean [1 ]
Peisert, Sean [2 ,3 ]
Bishop, Matt [3 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
Multiclass classification; Bayesian networks; Random forests; Self-organizing maps; High performance computing; Communication patterns; NETWORK MOTIFS;
D O I
10.1016/j.patrec.2012.10.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High Performance Computing (HPC) is a field concerned with solving large-scale problems in science and engineering. However, the computational infrastructure of HPC systems can also be misused as demonstrated by the recent commoditization of cloud computing resources on the black market As a first step towards addressing this, we introduce a machine learning approach for classifying distributed parallel computations based on communication patterns between compute nodes. We first provide relevant background on message passing and computational equivalence classes called dwarfs and describe our exploratory data analysis using self organizing maps. We then present our classification results across 29 scientific codes using Bayesian networks and compare their performance against Random Forest classifiers. These models, trained with hundreds of gigabytes of communication logs collected at Lawrence Berkeley National Laboratory, perform well without any a priori information and address several shortcomings of previous approaches. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:322 / 329
页数:8
相关论文
共 50 条
  • [21] ASYNCHRONOUS DISTRIBUTED SIMULATION VIA A SEQUENCE OF PARALLEL COMPUTATIONS
    CHANDY, KM
    MISRA, J
    COMMUNICATIONS OF THE ACM, 1981, 24 (04) : 198 - 206
  • [22] Parallel processing for boundary element computations on distributed systems
    Song, SW
    Baddour, RE
    ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 1997, 19 (01) : 73 - 84
  • [23] Efficient Data-parallel Computations on Distributed Systems
    曾志勇
    HighTechnologyLetters, 2002, (03) : 92 - 96
  • [24] Affinity Driven Distributed Scheduling Algorithm for Parallel Computations
    Narang, Ankur
    Srivastava, Abhinav
    Kumar, Naga Praveen
    Shyamasundar, Rudrapatna K.
    DISTRIBUTED COMPUTING AND NETWORKING, 2011, 6522 : 167 - +
  • [25] Performance driven distributed scheduling of parallel hybrid computations
    Narang, Ankur
    Shyamasundar, Rudrapatna K.
    THEORETICAL COMPUTER SCIENCE, 2011, 412 (32) : 4212 - 4225
  • [26] Parallel and distributed evolutionary computations for multimodal function optimization
    Rupela, V
    Dozier, G
    MULTIMEDIA, IMAGE PROCESSING AND SOFT COMPUTING: TRENDS, PRINCIPLES AND APPLICATIONS, 2002, 13 : 307 - 312
  • [27] Complexity-based parallel rule induction for multiclass classification
    Asadi, Shahrokh
    Shahrabi, Jamal
    INFORMATION SCIENCES, 2017, 380 : 53 - 73
  • [28] Communication Lower Bounds for Distributed-Memory Computations
    Scquizzato, Michele
    Silvestri, Francesco
    31ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2014), 2014, 25 : 627 - 638
  • [29] Adaptive scheduling of computations and communications on distributed memory systems
    Al-Mouhamed, M
    Najjari, H
    1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 366 - 373
  • [30] Load balancing problems for multiclass jobs in distributed/parallel computer systems
    Li, J
    Kameda, H
    IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (03) : 322 - 332