Large-Scale Training Framework for Video Annotation

被引：0

作者：

Hwang, Seong Jae ^{[1
,2
]}

Lee, Joonseok ^{[2
]}

Varadarajan, Balakrishnan ^{[2
]}

Gordon, Ariel ^{[2
]}

Xu, Zheng ^{[2
]}

Natsev, Apostol ^{[2
]}

机构：

[1] Univ Wisconsin, Madison, WI 53706 USA

[2] Google Res, Mountain View, CA USA

来源：

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2019年

关键词：

Scalability; Distributed framework; Video annotation; MapReduce;

D O I：

10.1145/3292500.3330653

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video is one of the richest sources of information available online but extracting deep insights from video content at internet scale is still an open problem, both in terms of depth and breadth of understanding, as well as scale. Over the last few years, the field of video understanding has made great strides due to the availability of large-scale video datasets and core advances in image, audio, and video modeling architectures. However, the state-of-the-art architectures on small scale datasets are frequently impractical to deploy at internet scale, both in terms of the ability to train such deep networks on hundreds of millions of videos, and to deploy them for inference on billions of videos. In this paper, we present a MapReduce-based training framework, which exploits both data parallelism and model parallelism to scale training of complex video models. The proposed framework uses alternating optimization and full-batch fine-tuning, and supports large Mixture-of-Experts classifiers with hundreds of thousands of mixtures, which enables a trade-off between model depth and breadth, and the ability to shift model capacity between shared (generalization) layers and per-class (specialization) layers. We demonstrate that the proposed framework is able to reach state-of-the-art performance on the largest public video datasets, YouTube-8M and Sports-1M, and can scale to 100 times larger datasets.

引用

页码：2394 / 2402

页数：9

共 50 条

[21] The Research of Large-Scale Video Server Cluster
Guo, Qingping
Zhou, Guangyou
DCABES 2008 PROCEEDINGS, VOLS I AND II, 2008, : 797 - 801
[22] Scheduling for large-scale parallel video servers
Wu, MY
Shu, W
FRONTIERS '96 - THE SIXTH SYMPOSIUM ON FRONTIERS OF MASSIVELY PARALLEL COMPUTING, PROCEEDINGS, 1996, : 126 - 133
[23] Duplicate video detection for large-scale multimedia
Jun, Woogyoung
Lee, Yillbyung
Jun, Byoung-Min
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (23) : 15665 - 15678
[24] Face Retrieval on Large-Scale Video Data
Herrmann, Christian
Beyerer, Juergen
2015 12TH CONFERENCE ON COMPUTER AND ROBOT VISION CRV 2015, 2015, : 192 - 199
[25] YouTubeEvent: on Large-Scale Video Event Classification
Ni, Bingbing
Song, Yang
Zhao, Ming
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[26] Towards large-scale sample annotation in gene expression repositories
Pitzer, Erik
Lacson, Ronilda
Hinske, Christian
Kim, Jihoon
Galante, Pedro A. F.
Ohno-Machado, Lucila
BMC BIOINFORMATICS, 2009, 10
[27] Duplicate video detection for large-scale multimedia
Woogyoung Jun
Yillbyung Lee
Byoung-Min Jun
Multimedia Tools and Applications, 2016, 75 : 15665 - 15678
[28] Sports Video Analysis on Large-Scale Data
Wu, Dekun
Zhao, He
Bao, Xingce
Wildes, Richard P.
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 19 - 36
[29] Large-Scale Study of Perceptual Video Quality
Sinno, Zeina
Bovik, Alan Conrad
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 612 - 627
[30] Distributed architecture for large-scale video servers
Tanaka, K
Sakamoto, H
Suzuki, H
Nishimura, K
ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 578 - 583

← 1 2 3 4 5 →