A Universal Approximation Theorem for Mixture-of-Experts Models

被引:18
|
作者
Nguyen, Hien D. [1 ]
Lloyd-Jones, Luke R. [2 ]
McLachlan, Geoffrey J. [3 ]
机构
[1] Univ Queensland, Sch Math & Phys, Brisbane, Qld 4072, Australia
[2] Univ Queensland, Queensland Brain Inst, Ctr Neurogenet & Stat Genet, Brisbane, Qld 4072, Australia
[3] Univ Queensland, Sch Math & Phys, Brisbane, Qld 4072, Australia
关键词
HIERARCHICAL MIXTURES; REGRESSION;
D O I
10.1162/NECO_a_00892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mixture-of-experts (MoE) model is a popular neural network architecture for nonlinear regression and classification. The class of MoE mean functions is known to be uniformly convergent to any unknown target function, assuming that the target function is from a Sobolev space that is sufficiently differentiable and that the domain of estimation is a compact unit hypercube. We provide an alternative result, which shows that the class of MoE mean functions is dense in the class of all continuous functions over arbitrary compact domains of estimation. Our result can be viewed as a universal approximation theorem for MoE models. The theorem we present allows MoE users to be confident in applying such models for estimation when data arise from nonlinear and nondifferentiable generative processes.
引用
收藏
页码:2585 / 2593
页数:9
相关论文
共 50 条
  • [1] Asymptotic properties of mixture-of-experts models
    Olteanu, M.
    Rynkiewicz, J.
    [J]. NEUROCOMPUTING, 2011, 74 (09) : 1444 - 1449
  • [2] Steered Mixture-of-Experts Approximation of Spherical Image Data
    Verhack, Ruben
    Madhu, Nilesh
    Van Wallendael, Glenn
    Lambert, Peter
    Sikora, Thomas
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 256 - 260
  • [3] Spatial Mixture-of-Experts
    Dryden, Nikoli
    Hoefler, Torsten
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
    Du, Nan
    Huang, Yanping
    Dai, Andrew M.
    Tong, Simon
    Lepikhin, Dmitry
    Xu, Yuanzhong
    Krikun, Maxim
    Zhou, Yanqi
    Yu, Adams Wei
    Firat, Orhan
    Zoph, Barret
    Fedus, Liam
    Bosma, Maarten
    Zhou, Zongwei
    Wang, Tao
    Wang, Yu Emma
    Webster, Kellie
    Pellat, Marie
    Robinson, Kevin
    Meier-Hellstern, Kathleen
    Duke, Toju
    Dixon, Lucas
    Zhang, Kun
    Le, Quoc V.
    Wu, Yonghui
    Chen, Zhifeng
    Cui, Claire
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] New estimation and feature selection methods in mixture-of-experts models
    Khalili, Abbas
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (04): : 519 - 539
  • [6] PROGRESSIVE MODELING OF STEERED MIXTURE-OF-EXPERTS FOR LIGHT FIELD VIDEO APPROXIMATION
    Verhack, Ruben
    Van Wallendael, Glenn
    Courteaux, Martijn
    Lambert, Peter
    Sikora, Thomas
    [J]. 2018 PICTURE CODING SYMPOSIUM (PCS 2018), 2018, : 268 - 272
  • [7] Hierarchical mixture-of-experts models for count variables with excessive zeros
    Park, Myung Hyun
    Kim, Joseph H. T.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (12) : 4072 - 4096
  • [8] Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy
    Zhang, Shaolei
    Feng, Yang
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7306 - 7317
  • [9] UNIVERSAL IMAGE CODING APPROACH USING SPARSE STEERED MIXTURE-OF-EXPERTS REGRESSION
    Verhack, Ruben
    Sikora, Thomas
    Lange, Lieven
    Van Wallendael, Glenn
    Lambert, Peter
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2142 - 2146
  • [10] Adaptive mixture-of-experts models for data glove interface with multiple users
    Yoon, Jong-Won
    Yang, Sung-Ihk
    Cho, Sung-Bae
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4898 - 4907