Trends in algorithms for nonuniform applications on hierarchical distributed architectures

被引:0
|
作者
Keyes, DE [1 ]
机构
[1] Old Dominion Univ, Dept Math & Stat, Norfolk, VA 23529 USA
关键词
D O I
暂无
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
Scientific programmers are accustomed to expressing in their programs the "who" (variable declarations) and the "what" (operations), in some sequentialized order, and leaving to the systems software and hardware the questions of "when" and "where". This act of delegation is appropriate at the small scales, since programmer management of pipelines, multiple functional units, and multilevel caches is presently beyond reward, and the depth and complexity of such performance-motivated architectural developments are sure to increase. However, disregard for the differential costs of accessing different locations in memory (the "flat memory" model) can put unnecessary amounts of synchronization and data motion on the critical path of program execution. Different organization of algorithms leading to mathematically equivalent results can have very different levels of exposed synchronization and data motion, and algorithmicists of the future will have to be conscious of and adapt to the distributed and hierarchical aspects of memory architecture. Plenty of examples of architecturally motivated algorithmic adaptations can be given today; we illustrate herein with examples from recent aerodynamics simulations. For this purpose, pseudo-transient Newton-Krylov-Schwarz methods are briefly introduced and their parallel scalability in bulk synchronous SPMD applications is explored. We also indicate some fundamental limitations of bulk synchronous implicit solvers and propose asynchronous forms of nonlinear Schwarz methods as perhaps better adapted both to massively parallel architectures and strongly nonuniform applications. Suitably adapted PDE solvers seem to be readily extrapolated to the 100 Tflop/s capabilities envisioned in the coming decade, Making use of some novel quantitative metrics for the memory access efficiencies of high performance applications ("memtropy") and for the local strength of nonlinearity ("tensoricity") in applications with spatially nonuniform characteristics, we propose a migration path for scientific and engineering simulations towards the distributed and hierarchical Teraflops world, and we consider what simulations in this world will look like.
引用
收藏
页码:103 / 137
页数:35
相关论文
共 50 条
  • [1] Hierarchical algorithms on hierarchical architectures
    Keyes, D. E.
    Ltaief, H.
    Turkiyyah, G.
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
  • [2] Load Balancing Algorithms in Distributed Service Architectures for Medical Applications
    Logeswaran, Rajasvaran
    Chen, Li-Choo
    [J]. INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS, 2010, 5 (01) : 76 - 90
  • [3] Hierarchical backoff locks for nonuniform communication architectures
    Radovic, Z
    Hagersten, E
    [J]. NINTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 241 - 252
  • [4] Hierarchical parallel algorithms for module placement of large chips on distributed memory architectures
    Yang, LTR
    [J]. DCABES 2002, PROCEEDING, 2002, : 47 - 51
  • [5] Distributed mutual exclusion algorithms for grid applications: A hierarchical approach
    Bertier, M
    Arantes, L
    Sens, P
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2006, 66 (01) : 128 - 144
  • [6] GSPN subnet aggregation algorithms for modeling distributed applications in parallel architectures
    Bressan, PA
    Trevelin, LC
    [J]. SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 1728 - 1734
  • [7] Load balancing and locality in hierarchical n-body algorithms on distributed memory architectures
    Baiardi, F
    Becuzzi, P
    Mori, P
    Paoli, M
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 284 - 293
  • [8] Parallel efficient hierarchical algorithms for module placement of large chips on distributed memory architectures
    Yang, LT
    [J]. PAR ELEC 2002: INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2002, : 405 - 408
  • [9] Hierarchical distributed genetic algorithms
    Herrera, F
    Lozano, M
    Moraga, C
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1999, 14 (11) : 1099 - 1121
  • [10] Irregular Applications: Architectures & Algorithms
    Feo, John
    Villa, Oreste
    Tumeo, Antonino
    Secchi, Simone
    [J]. PROCEEDINGS OF THE FIRST WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHM (IAAA'11), 2011, : 1 - 2