Trends in algorithms for nonuniform applications on hierarchical distributed architectures

被引:0
|
作者
Keyes, DE [1 ]
机构
[1] Old Dominion Univ, Dept Math & Stat, Norfolk, VA 23529 USA
关键词
D O I
暂无
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
Scientific programmers are accustomed to expressing in their programs the "who" (variable declarations) and the "what" (operations), in some sequentialized order, and leaving to the systems software and hardware the questions of "when" and "where". This act of delegation is appropriate at the small scales, since programmer management of pipelines, multiple functional units, and multilevel caches is presently beyond reward, and the depth and complexity of such performance-motivated architectural developments are sure to increase. However, disregard for the differential costs of accessing different locations in memory (the "flat memory" model) can put unnecessary amounts of synchronization and data motion on the critical path of program execution. Different organization of algorithms leading to mathematically equivalent results can have very different levels of exposed synchronization and data motion, and algorithmicists of the future will have to be conscious of and adapt to the distributed and hierarchical aspects of memory architecture. Plenty of examples of architecturally motivated algorithmic adaptations can be given today; we illustrate herein with examples from recent aerodynamics simulations. For this purpose, pseudo-transient Newton-Krylov-Schwarz methods are briefly introduced and their parallel scalability in bulk synchronous SPMD applications is explored. We also indicate some fundamental limitations of bulk synchronous implicit solvers and propose asynchronous forms of nonlinear Schwarz methods as perhaps better adapted both to massively parallel architectures and strongly nonuniform applications. Suitably adapted PDE solvers seem to be readily extrapolated to the 100 Tflop/s capabilities envisioned in the coming decade, Making use of some novel quantitative metrics for the memory access efficiencies of high performance applications ("memtropy") and for the local strength of nonlinearity ("tensoricity") in applications with spatially nonuniform characteristics, we propose a migration path for scientific and engineering simulations towards the distributed and hierarchical Teraflops world, and we consider what simulations in this world will look like.
引用
收藏
页码:103 / 137
页数:35
相关论文
共 50 条
  • [31] A Framework for Parallel Genetic Algorithms for Distributed Memory Architectures
    Georgiev, Dobromir
    Atanassov, Emanouil
    Alexandrov, Vassil
    [J]. 2014 5TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS (SCALA), 2014, : 47 - 53
  • [32] Parameter estimation algorithms for hierarchical distributed systems
    Al-Dabass, D
    Zreiba, A
    Evans, DJ
    Sivayoganathan, S
    [J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2002, 79 (01) : 65 - 88
  • [33] Architectures, Algorithms, and Applications Using Bayesian Networks
    Kingsbury, Todd
    [J]. MULTISENSOR, MULTISOURCE INFORMATION FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS 2011, 2011, 8064
  • [34] Latest trends in computer architectures and parallel and distributed technologies
    Schulze, Bruno
    Rebello, Vinod
    Moreira, Jose
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (06): : 771 - 774
  • [35] Gecko: Hierarchical Distributed View of Heterogeneous Shared Memory Architectures
    Ghane, Millad
    Chandrasekaran, Sunita
    Cheung, Margaret S.
    [J]. PROCEEDINGS OF THE TENTH INTERNATIONAL WORKSHOP ON PROGRAMMING MODELS AND APPLICATIONS FOR MULTICORES AND MANYCORES (PMAM 2019), 2019, : 21 - 30
  • [36] Technology Scaling in FPGAs: Trends in Applications and Architectures
    Shannon, Lesley
    Cojocaru, Veronica
    Cong Nguyen Dao
    Leong, Philip H. W.
    [J]. 2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 1 - 8
  • [37] Hierarchical distributed architectures for autonomous mobile robots: a case study
    Azevedo, Jose Luis
    Cunha, Bernardo
    Almeida, Luis
    [J]. ETFA 2007: 12TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, VOLS 1-3, 2007, : 973 - 980
  • [38] An analytical comparison of distributed and hierarchical Web-caching architectures
    Hurley, RT
    Feng, W
    Li, BY
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2003, : 291 - 295
  • [39] On generating distributed intelligence systems architectures using genetic algorithms
    Zaidi, AK
    Levis, AH
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 1998, 28 (03): : 453 - 459
  • [40] LU FACTORIZATION ALGORITHMS ON DISTRIBUTED-MEMORY MULTIPROCESSOR ARCHITECTURES
    GEIST, GA
    ROMINE, CH
    [J]. SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1988, 9 (04): : 639 - 649