Hard Mixtures of Experts for Large ScaleWeakly Supervised Vision

被引:20
|
作者
Gross, Sam [1 ]
Ranzato, Marc'Aurelio [1 ]
Szlam, Arthur [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
10.1109/CVPR.2017.540
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new [7, 3], but in the past, researchers have had to devise sophisticated methods to deal with data fragmentation. We show empirically that modern weakly supervised data sets are large enough to support naive partitioning schemes where each data point is assigned to a single expert. Because the experts are independent, training them in parallel is easy, and evaluation is cheap for the size of the model. Furthermore, we show that we can use a single decoding layer for all the experts, allowing a unified feature embedding space. We demonstrate that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets.
引用
收藏
页码:5085 / 5093
页数:9
相关论文
共 50 条
  • [1] Scaling large learning problems with hard parallel mixtures
    Collobert, R
    Bengio, Y
    Bengio, S
    [J]. PATTERN RECOGNITION WITH SUPPORT VECTOR MACHINES, PROCEEDINGS, 2002, 2388 : 8 - 23
  • [2] Scaling large learning problems with hard parallel mixtures
    Collobert, R
    Bengio, Y
    Bengio, S
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2003, 17 (03) : 349 - 365
  • [3] Biased Mixtures of Experts: Enabling Computer Vision Inference Under Data Transfer Limitations
    Abbas, Alhabib
    Andreopoulos, Yiannis
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7656 - 7667
  • [4] Mixtures of Heterogeneous Experts
    Parton, Callum
    Engelbrecht, Andries
    [J]. 2020 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2020), 2020, : 1 - 7
  • [5] Mixtures of hard and soft grains: micromechanical behavior at large strains
    Guilhem Mollon
    [J]. Granular Matter, 2018, 20
  • [6] Mixtures of hard and soft grains: micromechanical behavior at large strains
    Mollon, Guilhem
    [J]. GRANULAR MATTER, 2018, 20 (03)
  • [7] Supervised Topic Regression via Experts
    Lin, Song
    Guo, Ping
    Xin, Xin
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3526 - 3533
  • [8] COMPUTER VISION - NOT FOR EXPERTS ONLY
    WAKS, S
    [J]. COMPUTERS & EDUCATION, 1990, 14 (02) : 173 - 181
  • [9] On the identifiability of mixtures-of-experts
    Jiang, W
    Tanner, MA
    [J]. NEURAL NETWORKS, 1999, 12 (09) : 1253 - 1258
  • [10] Functional mixtures-of-experts
    Chamroukhi, Faicel
    Pham, Nhat Thien
    Hoang, Van Ha
    Mclachlan, Geoffrey J.
    [J]. STATISTICS AND COMPUTING, 2024, 34 (03)