A Machine-Learning Approach for Communication Prediction of Large-Scale Applications

被引:6
|
作者
Papadopoulou, Nikela [1 ]
Goumas, Georgios [1 ]
Koziris, Nectarios [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, GR-10682 Athens, Greece
关键词
performance prediction; communication time; MPI applications; large-scale systems;
D O I
10.1109/CLUSTER.2015.27
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we present a machine-learning approach to predict the total communication time of parallel applications. Communication time is heavily dependent on a very wide set of parameters relevant to the architecture, runtime configuration and application communication profile. We focus our study on parameters that can be easily extracted from the application and the process mapping ahead of execution. To this direction we define a small set of descriptive metrics and build a simple benchmark that can sweep over the parameter space in a straightforward way. We use this benchmarking data to train a robust multiple variable regression model which serves as our communication predictor. Our experimental results show notable accuracy in predicting the communication time of two indicative application kernels on a supercomputer utilizing from a few dozen to a few thousands processing cores.
引用
收藏
页码:120 / 123
页数:4
相关论文
共 50 条
  • [31] Large-scale kernel extreme learning machine
    Deng, Wan-Yu
    Zheng, Qing-Hua
    Chen, Lin
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
  • [32] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [33] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
  • [34] Coding for Large-Scale Distributed Machine Learning
    Xiao, Ming
    Skoglund, Mikael
    ENTROPY, 2022, 24 (09)
  • [35] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [36] Robust Large-Scale Machine Learning in the Cloud
    Rendle, Steffen
    Fetterly, Dennis
    Shekita, Eugene J.
    Su, Bor-yiing
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134
  • [37] Resource Elasticity for Large-Scale Machine Learning
    Huang, Botong
    Boehm, Matthias
    Tian, Yuanyuan
    Reinwald, Berthold
    Tatikonda, Shirish
    Reiss, Frederick R.
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 137 - 152
  • [38] TensorFlow: A system for large-scale machine learning
    Abadi, Martin
    Barham, Paul
    Chen, Jianmin
    Chen, Zhifeng
    Davis, Andy
    Dean, Jeffrey
    Devin, Matthieu
    Ghemawat, Sanjay
    Irving, Geoffrey
    Isard, Michael
    Kudlur, Manjunath
    Levenberg, Josh
    Monga, Rajat
    Moore, Sherry
    Murray, Derek G.
    Steiner, Benoit
    Tucker, Paul
    Vasudevan, Vijay
    Warden, Pete
    Wicke, Martin
    Yu, Yuan
    Zheng, Xiaoqiang
    PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2016, : 265 - 283
  • [39] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [40] A Prediction Model for Osteoporosis Risk Using a Machine-Learning Approach and Its Validation in a Large Cohort
    Wu, Xuangao
    Park, Sunmin
    JOURNAL OF KOREAN MEDICAL SCIENCE, 2023, 38 (21)