New Performance Modeling Methods for Parallel Data Processing Applications

被引:11
|
作者
Bhimani, Janki [1 ]
Mi, Ningfang [1 ]
Leeser, Miriam [1 ]
Yang, Zhengyu [1 ]
机构
[1] Northeastern Univ, 360 Huntington Ave, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Performance modeling; queuing theory; Markov model; distributed systems; execution time; parallel calculation; communication network; prediction;
D O I
10.1145/3309684
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Predicting the performance of an application running on parallel computing platforms is increasingly becoming important because of its influence on development time and resource management. However, predicting the performance with respect to parallel processes is complex for iterative and multi-stage applications. This research proposes a performance approximation approach FiM to predict the calculation time with FiM-Cal and communication time with FiM-Com of an application running on a distributed framework. FiM-Cal consists of two key components that are coupled with each other: (1) a Stochastic. Markov Model to capture non-deterministic runtime that often depends on parallel resources, e.g., number of processes, and (2) a machine-learning model that extrapolates the parameters for calibrating our Markov model when we have changes in application parameters such as dataset. Along with the parallel calculation time, parallel computing platforms consume some data transfer time to communicate among different nodes. FiM-Com consists of a simulation queuing model to quickly estimate communication time. Our new modeling approach considers different design choices along multiple dimensions, namely (i) process-level parallelism, (ii) distribution of cores on multi-processor platform, (iii) application related parameters, and (iv) characteristics of datasets. The major contribution of our prediction approach is that FiM can provide an accurate prediction of parallel processing time for the datasets that have a much larger size than that of the training datasets. We evaluate our approach with NAS Parallel Benchmarks and real iterative data processing applications. We compare the predicted results (e.g., end-to-end execution time) with actual experimental measurements on a real distributed platform. We also compare our work with an existing prediction technique based on machine learning. We rank the number of processes according to the actual and predicted results from FLM and calculate the correlation between the actual and predicted rankings. Our results show that FiM obtains a high correlation in the range of 0.80 to 0.99, which indicates considerable accuracy of our technique. Such prediction provides data analysts a useful insight of optimal configuration of parallel resources (e.g., number of processes and number of cores) and also helps system designers to investigate the impact of changes in application parameters on system performance.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Parallel Network Data Processing in Client Side Java']JavaScript Applications
    Wenzel, Matthias
    Meinel, Christoph
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS, 2015, : 140 - 147
  • [42] Exploiting graphical processing units for data-parallel scientific applications
    Leist, A.
    Playne, D. P.
    Hawick, K. A.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (18): : 2400 - 2437
  • [43] PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud
    Ferrarons, Jaume
    Adhana, Mulu
    Colmenares, Carlos
    Pietrowska, Sandra
    Bentayeb, Fadila
    Darmont, Jerome
    PERFORMANCE CHARACTERIZATION AND BENCHMARKING, 2014, 8391 : 109 - 124
  • [44] Optimising data movement rates for parallel processing applications on graphics processors
    Harrison, Owen
    Waldron, John
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS, 2007, : 251 - +
  • [45] Extension of Parallel Primitives and Their Applications to Large-Scale Data Processing
    Nakano, Masashi
    Chang, Qiong
    Miyazaki, Jun
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT II, DEXA 2024, 2024, 14911 : 248 - 253
  • [46] New parallel processing strategies in complex event processing systems with data streams
    Xiao, Fuyuan
    Zhan, Cheng
    Lai, Hong
    Tao, Li
    Qu, Zhiguo
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (08): : 1 - 15
  • [47] Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
    Ahmad, Maaz Bin Safeer
    Cheung, Alvin
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1675 - 1678
  • [48] XPP - A high performance parallel signal processing platform for space applications
    Syed, MA
    Helfers, T
    Schueler, E
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 36 - 42
  • [49] Performance modeling of optical interconnection technologies for massively parallel processing systems
    CruzRivera, JL
    Lacy, WS
    Wills, DS
    Gaylord, TK
    Glytsis, EN
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON MASSIVELY PARALLEL PROCESSING USING OPTICAL INTERCONNECTIONS, 1996, : 264 - 275
  • [50] Realization of a programmable parallel DSP for high performance image processing applications
    Wittenburg, JP
    Hinrichs, W
    Kneip, J
    Ohmacht, M
    Berekovic, M
    Lieske, H
    Kloos, H
    Pirsch, P
    1998 DESIGN AUTOMATION CONFERENCE, PROCEEDINGS, 1998, : 56 - 61