A flexible probabilistic framework for large-margin mixture of experts

被引:0
|
作者
Archit Sharma
Siddhartha Saxena
Piyush Rai
机构
[1] Google Brain,Google AI Resident
[2] IIT Kanpur,undefined
来源
Machine Learning | 2019年 / 108卷
关键词
Probabilistic modelling; Mixture of experts; Bayesian SVMs;
D O I
暂无
中图分类号
学科分类号
摘要
Mixture-of-Experts (MoE) enable learning highly nonlinear models by combining simple expert models. Each expert handles a small region of the data space, as dictated by the gating network which generates the (soft) assignment of input to the corresponding experts. Despite their flexibility and renewed interest lately, existing MoE constructions pose several difficulties during model training. Crucially, neither of the two popular gating networks used in MoE, namely the softmax gating network and hierarchical gating network (the latter used in the hierarchical mixture of experts), have efficient inference algorithms. The problem is further exacerbated if the experts do not have conjugate likelihood and lack a naturally probabilistic formulation (e.g., logistic regression or large-margin classifiers such as SVM). To address these issues, we develop novel inference algorithms with closed-form parameter updates, leveraging some of the recent advances in data augmentation techniques. We also present a novel probabilistic framework for MoE, consisting of a range of gating networks with efficient inference made possible through our proposed algorithms. We exploit this framework by using Bayesian linear SVMs as experts on various classification problems (which has a non-conjugate likelihood otherwise generally), providing our final model with attractive large-margin properties. We show that our models are significantly more efficient than other training algorithms for MoE while outperforming other traditional non-linear models like Kernel SVMs and Gaussian Processes on several benchmark datasets.
引用
收藏
页码:1369 / 1393
页数:24
相关论文
共 50 条
  • [1] A flexible probabilistic framework for large-margin mixture of experts
    Sharma, Archit
    Saxena, Siddhartha
    Rai, Piyush
    [J]. MACHINE LEARNING, 2019, 108 (8-9) : 1369 - 1393
  • [2] Hierarchical large-margin Gaussian mixture models for phonetic classification
    Chang, Hung-An
    Glass, James R.
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 272 - 277
  • [3] Large-Margin Supervised Hashing
    Zhang, Xiaopeng
    Zhang, Hui
    Chen, Yong
    Liu, Xianglong
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 259 - 269
  • [4] Multicategory Large-Margin Unified Machines
    Zhang, Chong
    Liu, Yufeng
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 1349 - 1386
  • [5] Large-Margin Classification in Hyperbolic Space
    Cho, Hyunghoon
    DeMeo, Benjamin
    Peng, Jian
    Berger, Bonnie
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [6] Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification
    Yun, Sungrack
    Yoo, Chang D.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 585 - 598
  • [7] Probability estimation for large-margin classifiers
    Wang, Junhui
    Shen, Xiaotong
    Liu, Yufeng
    [J]. BIOMETRIKA, 2008, 95 (01) : 149 - 167
  • [8] Comparison theorems on large-margin learning
    Benabid, Amina
    Fan, Jun
    Xiang, Dao-Hong
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2021, 19 (05)
  • [9] VARIABILITY REGULARIZATION IN LARGE-MARGIN CLASSIFICATION
    Mansjur, Dwi Sianto
    Wada, Ted S.
    Juang, Biing-Hwang
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1956 - 1959
  • [10] Large-Margin Convex Polytope Machine
    Kantchelian, Alex
    Tschantz, Michael Carl
    Huang, Ling
    Bartlett, Peter L.
    Joseph, Anthony D.
    Tygar, J. D.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27