Large-Scale Training Framework for Video Annotation

被引:0
|
作者
Hwang, Seong Jae [1 ,2 ]
Lee, Joonseok [2 ]
Varadarajan, Balakrishnan [2 ]
Gordon, Ariel [2 ]
Xu, Zheng [2 ]
Natsev, Apostol [2 ]
机构
[1] Univ Wisconsin, Madison, WI 53706 USA
[2] Google Res, Mountain View, CA USA
关键词
Scalability; Distributed framework; Video annotation; MapReduce;
D O I
10.1145/3292500.3330653
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video is one of the richest sources of information available online but extracting deep insights from video content at internet scale is still an open problem, both in terms of depth and breadth of understanding, as well as scale. Over the last few years, the field of video understanding has made great strides due to the availability of large-scale video datasets and core advances in image, audio, and video modeling architectures. However, the state-of-the-art architectures on small scale datasets are frequently impractical to deploy at internet scale, both in terms of the ability to train such deep networks on hundreds of millions of videos, and to deploy them for inference on billions of videos. In this paper, we present a MapReduce-based training framework, which exploits both data parallelism and model parallelism to scale training of complex video models. The proposed framework uses alternating optimization and full-batch fine-tuning, and supports large Mixture-of-Experts classifiers with hundreds of thousands of mixtures, which enables a trade-off between model depth and breadth, and the ability to shift model capacity between shared (generalization) layers and per-class (specialization) layers. We demonstrate that the proposed framework is able to reach state-of-the-art performance on the largest public video datasets, YouTube-8M and Sports-1M, and can scale to 100 times larger datasets.
引用
收藏
页码:2394 / 2402
页数:9
相关论文
共 50 条
  • [21] The Research of Large-Scale Video Server Cluster
    Guo, Qingping
    Zhou, Guangyou
    DCABES 2008 PROCEEDINGS, VOLS I AND II, 2008, : 797 - 801
  • [22] Scheduling for large-scale parallel video servers
    Wu, MY
    Shu, W
    FRONTIERS '96 - THE SIXTH SYMPOSIUM ON FRONTIERS OF MASSIVELY PARALLEL COMPUTING, PROCEEDINGS, 1996, : 126 - 133
  • [23] Duplicate video detection for large-scale multimedia
    Jun, Woogyoung
    Lee, Yillbyung
    Jun, Byoung-Min
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (23) : 15665 - 15678
  • [24] Face Retrieval on Large-Scale Video Data
    Herrmann, Christian
    Beyerer, Juergen
    2015 12TH CONFERENCE ON COMPUTER AND ROBOT VISION CRV 2015, 2015, : 192 - 199
  • [25] YouTubeEvent: on Large-Scale Video Event Classification
    Ni, Bingbing
    Song, Yang
    Zhao, Ming
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [26] Towards large-scale sample annotation in gene expression repositories
    Pitzer, Erik
    Lacson, Ronilda
    Hinske, Christian
    Kim, Jihoon
    Galante, Pedro A. F.
    Ohno-Machado, Lucila
    BMC BIOINFORMATICS, 2009, 10
  • [27] Duplicate video detection for large-scale multimedia
    Woogyoung Jun
    Yillbyung Lee
    Byoung-Min Jun
    Multimedia Tools and Applications, 2016, 75 : 15665 - 15678
  • [28] Sports Video Analysis on Large-Scale Data
    Wu, Dekun
    Zhao, He
    Bao, Xingce
    Wildes, Richard P.
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 19 - 36
  • [29] Large-Scale Study of Perceptual Video Quality
    Sinno, Zeina
    Bovik, Alan Conrad
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 612 - 627
  • [30] Distributed architecture for large-scale video servers
    Tanaka, K
    Sakamoto, H
    Suzuki, H
    Nishimura, K
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 578 - 583