The Case for Domain-Specific Networks

被引:0
|
作者
Abts, Dennis [1 ]
Kim, John [2 ]
机构
[1] NVIDIA, Santa Clara, CA 95050 USA
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
关键词
D O I
10.1109/HOTI59126.2023.00021
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern parallel computers are dichotomized into capacity or capability systems. Capacity systems cater to a wide range of weak scaling workloads, using distributed parallel systems with message passing while capability systems focus on strong scaling workloads across a significant fraction of the machine's processing units. The interconnection network differs under these regimes, with commodity Ethernet or Infiniband solutions typically deployed for capacity systems, while capabilityclass systems often necessitate tightly-coupled, fine-grained communication. Systems built for AI training and inference embody traits from both classes: tight coupling and strong scaling for model parallelism, and weak scaling for data parallelism in a distributed system. Handling 100-billion-parameter large-language models and trillion-token data sets presents computational challenges for current supercomputing infrastructure. This paper discusses the crucial role of the interconnection network in these large-scale systems, advocating for flexible, low-latency interconnects that can deliver high throughput at large scales with tens of thousands of endpoints. This work also emphasizes the importance of reliability and resilience in enduring long-running training workloads and demanding inference requirements of domain-specific workloads.
引用
收藏
页码:49 / 52
页数:4
相关论文
共 50 条
  • [31] Domain-specific Hierarchical Subgraph Extraction: A Recommendation Use Case
    Lalithsena, Sarasi
    Perera, Sujan
    Kapanipathi, Pavan
    Sheth, Amit
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 666 - 675
  • [32] Mining domain-specific Thesauri from Wikipedia: A case study
    Milne, David
    Medelyan, Olena
    Witten, H.
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 442 - +
  • [33] A domain-specific dynamically reconfigurable hardware platform for wireless sensor networks
    Hinkelmann, Heiko
    Zipf, Peter
    Glesner, Manfred
    ICFPT 2007: INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2007, : 313 - 316
  • [34] SensorScript: a Business-Oriented Domain-Specific Language for Sensor Networks
    Garnier, Alexandre
    Menaud, Jean-Marc
    Pottier, Remy
    2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 44 - 49
  • [35] TruCom: Exploiting Domain-Specific Trust Networks for Multicategory Item Recommendation
    Liu, Haifeng
    Xia, Feng
    Chen, Zhen
    Asabere, Nana Yaw
    Ma, Jianhua
    Huang, Runhe
    IEEE SYSTEMS JOURNAL, 2017, 11 (01): : 295 - 304
  • [36] Hierarchical Task Networks as Domain-Specific Language for Planning Surgical Interventions
    Bihlmaier, Andreas
    Schreiter, Luzie
    Raczkowsky, Joerg
    Woern, Heinz
    INTELLIGENT AUTONOMOUS SYSTEMS 13, 2016, 302 : 1095 - 1105
  • [37] Time Is Not Space: Core Computations and Domain-Specific Networks for Mental Travels
    Gauthier, Baptiste
    van Wassenhove, Virginie
    JOURNAL OF NEUROSCIENCE, 2016, 36 (47): : 11891 - 11903
  • [38] On domain-specific languages reengineering
    Alias, C
    Barthou, D
    GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING, PROCEEDINGS, 2005, 3676 : 63 - 77
  • [39] Domain-specific regular acceleration
    Bernard Boigelot
    Boigelot, B. (boigelot@montefiore.ulg.ac.be), 1600, Springer Verlag (14): : 193 - 206
  • [40] Domain-specific keyphrase extraction
    Frank, E
    Paynter, GW
    Witten, IH
    Gutwin, C
    Nevill-Manning, CG
    IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 668 - 673