Modoru: Clos nanosecond optical switching for distributed deep training [Invited]

被引:0
|
作者
Wang, Cen [1 ]
Yoshikane, Noboru [1 ]
Elson, Daniel [1 ]
Wakayama, Yuta [1 ]
Soma, Daiki [1 ]
Beppu, Shohei [1 ]
Tsuritani, Takehiro [1 ]
机构
[1] KDDI Res Inc, Photon Div, 2-1-15 Ohara, Saitama 3568502, Japan
关键词
Optical switches; Topology; Switches; Training; Optical network units; Network topology; Scalability;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed deep training has become a significant consumer of bandwidth across datacenter-scale networks. The diverse parallel strategies employed in deep training require different communication patterns, necessitating the periodic adaptation of dynamic topologies. Since electrical switching approaches its capacity limit due to high bandwidths and has difficulties in regard to topology adaptation (i.e., logical and physical topologies are isomorphic), optical switching has become an attractive option to address these bottlenecks. In this paper, we propose Modoru, a wavelength- and datarate-agnostic Clos architecture with a switching speed of O(1 ns). Modoru is a drop-in replacement solution that has no constraints on achieving a high radix. To verify its topological flexibility, we also develop topology-as-a-service, which provisions sequentially dynamic topologies for training jobs and guarantees high topology availability over the entire network. Large-scale simulations show a basic 7.9 x acceleration in deep training jobs using Modoru. Additionally, experiments on the Modoru prototype demonstrate acceleration of deep training jobs through the provisioning of adaptive topologies.
引用
收藏
页码:A40 / A52
页数:13
相关论文
共 50 条
  • [1] Modoru: Clos nanosecond optical switching for distributed deep training [Invited]
    Wang, Cen
    Yoshikane, Noboru
    Elson, Daniel
    Wakayama, Yuta
    Soma, Daiki
    Beppu, Shohei
    Tsuritani, Takehiro
    [J]. JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (01) : A40 - A52
  • [2] Integrating Nanosecond Optical Switching in Deep Distributed Learning System
    Wang, Cen
    Yoshikane, Noboru
    Tsuritani, Takehiro
    [J]. 2023 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION, OFC, 2023,
  • [3] Clos lives on in optical packet switching
    Cheyns, J
    Develder, C
    Van Breusegem, E
    Colle, D
    De Turck, F
    Lagasse, P
    Pickavet, M
    Demeester, P
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2004, 42 (02) : 114 - 121
  • [4] Increasing Capacity of the Clos Structure for Optical Switching Networks
    Mano, Toru
    Inoue, Takeru
    Mizutani, Kimihiro
    Akashi, Osamu
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [5] RAMP: A flat nanosecond optical network and MPI operations for distributed deep learning systems
    Ottino, Alessandro
    Benjamin, Joshua
    Zervas, Georgios
    [J]. OPTICAL SWITCHING AND NETWORKING, 2024, 51
  • [6] Nanogel nanosecond photonic crystal optical switching
    Reese, CE
    Mikhonin, AV
    Kamenjicki, M
    Tikhonov, A
    Asher, SA
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2004, 126 (05) : 1493 - 1496
  • [7] All-Optical Switching Architecture Based on Ring-Clos
    Yang Xiaoxue
    Hu Bing
    [J]. ACTA OPTICA SINICA, 2022, 42 (16)
  • [8] From Small to Large: Clos Network for Scaling All-Optical Switching
    Lin, Jiemin
    Chang, Zeshan
    Zong, Liangjia
    Bose, Sanjay K.
    Chang, Tianhai
    Shen, Gangxiang
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2023, 61 (12) : 136 - 141
  • [9] INVITED: Accelerator Design for Deep Learning Training
    Agrawal, Ankur
    Chen, Chia-Yu
    Choi, Jungwook
    Gopalakrishnan, Kailash
    Oh, Jinwook
    Shukla, Sunil
    Srinivasan, Viji
    Venkataramani, Swagath
    Zhang, Wei
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [10] Nonlinear-optical probing of nanosecond ferroelectric switching
    Mishina, ED
    Sherstyuk, NE
    Stadnichuk, VI
    Sigov, AS
    Mukhorotov, VM
    Golovko, YI
    van Etteger, A
    Rasing, T
    [J]. APPLIED PHYSICS LETTERS, 2003, 83 (12) : 2402 - 2404