Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

被引:102
|
作者
Iwasawa, Masaki [1 ]
Tanikawa, Ataru [1 ,2 ]
Hosono, Natsuki [1 ]
Nitadori, Keigo [1 ]
Muranushi, Takayuki [1 ]
Makino, Junichiro [1 ,3 ,4 ]
机构
[1] RIKEN Adv Inst Computat Sci, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
[2] Univ Tokyo, Dept Earth & Astron, Coll Arts & Sci, Meguro Ku, 3-8-1 Komaba, Tokyo 1538902, Japan
[3] Kobe Univ, Grad Sch Sci, Dept Planetol, Nada Ku, 1-1 Rokkodai Cho, Kobe, Hyogo 6578501, Japan
[4] Tokyo Inst Technol, Earth Life Sci Inst, Meguro Ku, 2-12-1 Ookayama, Tokyo 1528551, Japan
关键词
dark matter; Galaxy: evolution; methods: numerical; planets and satellites: formation; SPECIAL-PURPOSE COMPUTER; SIMD INSTRUCTION SET; N-BODY SIMULATION; TREE-CODE; HYDRODYNAMICS; DYNAMICS; GALAXIES; SYSTEMS; SPH;
D O I
10.1093/pasj/psw053
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N-2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 10(7)) to 300 ms (N = 10(9)). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Development of a framework for parallel reservoir simulation
    Molano, Hector Emilio Barrios
    Sepehrnoori, Kamy
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (04): : 632 - 650
  • [42] Speculative synchronization:: Programmability and performance for parallel codes
    Martínez, JF
    Torrellas, J
    IEEE MICRO, 2003, 23 (06) : 126 - 134
  • [43] Developing a framework for reliability-based seismic objective performance assessment: Implementation and application☆
    Jafari, Abouzar
    Dehcheshmeh, Esmaeil Mohammadi
    Varaee, Hesam
    Zhou, Ying
    ENGINEERING STRUCTURES, 2025, 329
  • [44] Parallel Calligraphy Robot: Framework and System Implementation
    Bai, Tianxiang
    Guo, Chao
    Liu, Yating
    Lu, Yue
    Dai, Xingyuan
    Wang, Fei-Yue
    IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION, 2023, 7 : 163 - 167
  • [45] Massively Parallel Frequency Domain Electromagnetic Simulation Codes
    Langston, William L.
    Kotulski, Joseph
    Coats, Rebecca
    Jorgenson, Roy
    Blake, S. Adam
    Campione, Salvatore
    Pung, Aaron
    Zinser, Brian
    2018 INTERNATIONAL APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY SYMPOSIUM (ACES), 2018,
  • [46] A parallel object oriented framework for particle methods
    Hipp, M
    Hüttemann, S
    Konold, M
    Klingler, M
    Leinen, P
    Ritt, M
    Rosenstiel, W
    Ruder, H
    Speith, R
    Yserentant, H
    HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '99, 2000, : 483 - 495
  • [47] On the design, simulation and analysis of Parallel Concatenated Gallager Codes
    Behairy, H
    Chang, SC
    2002 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2002, : 1850 - 1854
  • [48] Efficient Implementation of Parallel Concatenated Gallager Codes with Single Encoder
    Aswathy, G. P.
    Haneefa, Niyas K.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMNET), 2016, : 39 - 43
  • [49] Efficient Data Redistribution Methods for Coupled Parallel Particle Codes
    Hofmann, Michael
    Ruenger, Gudula
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 40 - 49
  • [50] Parallel implementation of computational fluid dynamics codes on emerging architectures
    Behr, M
    Briggs, P
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XIV, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING III, 2002, : 105 - 110