Adaptive Gating in Mixture-of-Experts based Language Models

被引:0
|
作者
Li, Jiamin [1 ]
Su, Qiang [1 ]
Yang, Yitao [2 ]
Jiang, Yimin
Wang, Cong [1 ]
Xu, Hong [2 ]
机构
[1] City Univ Hong Kong, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models, such as OpenAI's Chat-GPT, have demonstrated exceptional language understanding capabilities in various NLP tasks. Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models while maintaining a constant number of computational operations. Existing MoE model adopts a fixed gating network where each token is computed by the same number of experts. However, this approach contradicts our intuition that the tokens in each sequence vary in terms of their linguistic complexity and, consequently, require different computational costs. Little is discussed in prior research on the trade-off between computation per token and model performance. This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts based on expert probability distribution. The proposed framework preserves sparsity while improving training efficiency. Additionally, curriculum learning is leveraged to further reduce training time. Extensive experiments on diverse NLP tasks show that adaptive gating reduces at most 22.5% training time while maintaining inference quality. Moreover, we conduct a comprehensive analysis of the routing decisions and present our insights when adaptive gating is used.
引用
收藏
页码:3577 / 3587
页数:11
相关论文
共 50 条
  • [31] Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
    Shi, Yuge
    Siddharth, N.
    Paige, Brooks
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [32] A Mixture-of-Experts Model for Antonym-Synonym Discrimination
    Xie, Zhipeng
    Zeng, Nan
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 558 - 564
  • [33] Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
    Makkuva, Ashok Vardhan
    Oh, Sewoong
    Kannan, Sreeram
    Viswanath, Pramod
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [34] AN IMAGE CODING APPROACH BASED ON MIXTURE-OF-EXPERTS REGRESSION USING EPANECHNIKOV KERNEL
    Liu, Boning
    Zhao, Yan
    Jiang, Xiaomeng
    Wang, Shigang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1807 - 1811
  • [35] Ftmoe: a federated transfer model based on mixture-of-experts for heterogeneous image classification
    Wei Liu
    Yingmeng Wang
    Kaige Li
    Zhao Tian
    Wei She
    Cluster Computing, 2025, 28 (3)
  • [36] Towards Understanding the Mixture-of-Experts Layer in Deep Learning
    Chen, Zixiang
    Deng, Yihe
    Wu, Yue
    Gu, Quanquan
    Li, Yuanzhi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] METAL: A framework for mixture-of-experts task and attention learning
    Miriana, Maryam S.
    Araabi, Babak N.
    Ahmadabadi, Majid Nili
    Siegwart, Roland R.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2012, 23 (04) : 111 - 128
  • [38] Enhancing Mispronunciation Detection with WavLM and Mixture-of-Experts Network
    Bao, Wenqian
    Yan, Yuchen
    Zhang, Jinsong
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 189 - 193
  • [39] Efficient Reflectance Capture With a Deep Gated Mixture-of-Experts
    Ma, Xiaohe
    Yu, Yaxin
    Wu, Hongzhi
    Zhou, Kun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 4246 - 4256
  • [40] Self-Supervised Mixture-of-Experts by Uncertainty Estimation
    Zheng, Zhuobin
    Yuan, Chun
    Zhu, Xinrui
    Lin, Zhihui
    Cheng, Yangyang
    Shi, Cheng
    Ye, Jiahui
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5933 - 5940