Hard Mixtures of Experts for Large ScaleWeakly Supervised Vision

被引：20

作者：

Gross, Sam ^{[1
]}

Ranzato, Marc'Aurelio ^{[1
]}

Szlam, Arthur ^{[1
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA USA

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

关键词：

D O I：

10.1109/CVPR.2017.540

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new [7, 3], but in the past, researchers have had to devise sophisticated methods to deal with data fragmentation. We show empirically that modern weakly supervised data sets are large enough to support naive partitioning schemes where each data point is assigned to a single expert. Because the experts are independent, training them in parallel is easy, and evaluation is cheap for the size of the model. Furthermore, we show that we can use a single decoding layer for all the experts, allowing a unified feature embedding space. We demonstrate that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets.

引用

页码：5085 / 5093

页数：9

共 50 条

[1] Scaling large learning problems with hard parallel mixtures
Collobert, R
Bengio, Y
Bengio, S
[J]. PATTERN RECOGNITION WITH SUPPORT VECTOR MACHINES, PROCEEDINGS, 2002, 2388 : 8 - 23
[2] Scaling large learning problems with hard parallel mixtures
Collobert, R
Bengio, Y
Bengio, S
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2003, 17 (03) : 349 - 365
[3] Biased Mixtures of Experts: Enabling Computer Vision Inference Under Data Transfer Limitations
Abbas, Alhabib
Andreopoulos, Yiannis
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7656 - 7667
[4] Mixtures of Heterogeneous Experts
Parton, Callum
Engelbrecht, Andries
[J]. 2020 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2020), 2020, : 1 - 7
[5] Mixtures of hard and soft grains: micromechanical behavior at large strains
Guilhem Mollon
[J]. Granular Matter, 2018, 20
[6] Mixtures of hard and soft grains: micromechanical behavior at large strains
Mollon, Guilhem
[J]. GRANULAR MATTER, 2018, 20 (03)
[7] Supervised Topic Regression via Experts
Lin, Song
Guo, Ping
Xin, Xin
[J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3526 - 3533
[8] COMPUTER VISION - NOT FOR EXPERTS ONLY
WAKS, S
[J]. COMPUTERS & EDUCATION, 1990, 14 (02) : 173 - 181
[9] On the identifiability of mixtures-of-experts
Jiang, W
Tanner, MA
[J]. NEURAL NETWORKS, 1999, 12 (09) : 1253 - 1258
[10] Functional mixtures-of-experts
Chamroukhi, Faicel
Pham, Nhat Thien
Hoang, Van Ha
Mclachlan, Geoffrey J.
[J]. STATISTICS AND COMPUTING, 2024, 34 (03)

← 1 2 3 4 5 →