A Survey on Malleability Solutions for High-Performance Distributed Computing

被引:6
|
作者
Aliaga, Jose, I [1 ]
Castillo, Maribel [1 ]
Iserte, Sergio [1 ]
Martin-Alvarez, Iker [1 ]
Mayo, Rafael [1 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana 12006, Spain
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 10期
关键词
exascale; job reconfiguration; MPI; data redistribution; resource management; adaptive workloads; MPI APPLICATIONS; FRAMEWORK;
D O I
10.3390/app12105231
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale supercomputers. Process malleability is presented as a straightforward mechanism to address that issue. Nowadays, the vast majority of HPC facilities are intended for distributed-memory applications based on the Message Passing (MP) paradigm. For this reason, many efforts are based on the Message Passing Interface (MPI), the de facto standard programming model. Malleability aims to rescale executions on-the-fly, in other words, reconfigure the number and layout of processes in running applications. Process malleability involves resources reallocation within the HPC system, handling processes of the application, and redistributing data among those processes to resume the execution. This manuscript compiles how different frameworks address process malleability, their main features, their integration in resource management systems, and how they may be used in user codes. This paper is a detailed state-of-the-art devised as an entry point for researchers who are interested in process malleability.
引用
收藏
页数:32
相关论文
共 50 条
  • [21] Software tools for high-performance computing: survey and recommendations
    Georgia Inst of Technology, Atlanta, United States
    Scientific Programming, 5 (03): : 239 - 249
  • [22] A survey on resource allocation in high performance distributed computing systems
    Hussain, Hameed
    Malik, Saif Ur Rehman
    Hameed, Abdul
    Khan, Samee Ullah
    Bickler, Gage
    Min-Allah, Nasro
    Qureshi, Muhammad Bilal
    Zhang, Limin
    Wang Yongji
    Ghani, Nasir
    Kolodziej, Joanna
    Zomaya, Albert Y.
    Xu, Cheng-Zhong
    Balaji, Pavan
    Vishnu, Abhinav
    Pinel, Fredric
    Pecero, Johnatan E.
    Kliazovich, Dzmitry
    Bouvry, Pascal
    Li, Hongxiang
    Wang, Lizhe
    Chen, Dan
    Rayes, Ammar
    PARALLEL COMPUTING, 2013, 39 (11) : 709 - 736
  • [23] A case study of a distributed high-performance computing system for neurocomputing
    Anguita, D
    Boni, A
    Parodi, G
    JOURNAL OF SYSTEMS ARCHITECTURE, 2000, 46 (05) : 429 - 438
  • [24] Special issue on grid computing, high-performance and distributed applications
    Herrero, Pilar
    Perez, Maria S.
    MULTIAGENT AND GRID SYSTEMS, 2007, 3 (04) : 353 - 354
  • [25] Orlando Tools: Supporting High-performance Computing in Distributed Environments
    Gorsky, Sergey
    Kostromin, Roman
    Feoktistov, Alexander
    Bychkov, Igor
    2020 VI INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND NANOTECHNOLOGY (IEEE ITNT-2020), 2020,
  • [26] Special Section: Grid computing, high-performance and distributed applications
    Herrero, Pilar
    Katz, Daniel S.
    Perez, Maria S.
    Talia, Domenico
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (02): : 257 - 258
  • [27] Energy-efficient high-performance parallel and distributed computing
    Samee Ullah Khan
    Pascal Bouvry
    Thomas Engel
    The Journal of Supercomputing, 2012, 60 : 163 - 164
  • [28] Energy-efficient high-performance parallel and distributed computing
    Khan, Samee Ullah
    Bouvry, Pascal
    Engel, Thomas
    JOURNAL OF SUPERCOMPUTING, 2012, 60 (02): : 163 - 164
  • [29] A High-Performance Parallel Approach to Image Processing in Distributed Computing
    Rakhimov, Mekhriddin
    Mamadjanov, Doniyor
    Mukhiddinov, Abulkosim
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [30] High-performance grid computing via distributed data access
    Andrews, P
    Banister, B
    Kovatch, P
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 1366 - 1373