Visual Event Recognition in Videos by Learning from Web Data

被引:76
|
作者
Duan, Lixin [1 ]
Xu, Dong [1 ]
Tsang, Ivor W. [1 ]
Luo, Jiebo [2 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
关键词
D O I
10.1109/CVPR.2010.5539870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e. g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pair-wise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover's Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data.
引用
收藏
页码:1959 / 1966
页数:8
相关论文
共 50 条
  • [1] Visual Event Recognition in Videos by Learning from Web Data
    Duan, Lixin
    Xu, Dong
    Tsang, Ivor Wai-Hung
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (09) : 1667 - 1680
  • [2] Event Recognition in Videos by Learning from Heterogeneous Web Sources
    Chen, Lin
    Duan, Lixin
    Xu, Dong
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2666 - 2673
  • [3] Action and Event Recognition in Videos by Learning From Heterogeneous Web Sources
    Niu, Li
    Xu, Xinxing
    Chen, Lin
    Duan, Lixin
    Xu, Dong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (06) : 1290 - 1304
  • [4] Learning From Web Videos for Event Classification
    Chesneau, Nicolas
    Alahari, Karteek
    Schmid, Cordelia
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3019 - 3029
  • [5] Visual Recognition in RGB Images and Videos by Learning from RGB-D Data
    Li, Wen
    Chen, Lin
    Xu, Dong
    Van Gool, Luc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (08) : 2030 - 2036
  • [6] Visual Recognition by Learning from Web Data: A Weakly Supervised Domain Generalization Approach
    Niu, Li
    Li, Wen
    Xu, Dong
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2774 - 2783
  • [7] Visual Recognition by Learning From Web Data via Weakly Supervised Domain Generalization
    Niu, Li
    Li, Wen
    Xu, Dong
    Cai, Jianfei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (09) : 1985 - 1999
  • [8] Extracting Key Segments of Videos for Event Detection by Learning From Web Sources
    Song, Hao
    Wu, Xinxiao
    Yu, Wennan
    Jia, Yunde
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (05) : 1088 - 1100
  • [9] Mining Event Structures from Web Videos
    Wu, Xiao
    Lu, Yi-Jie
    Peng, Qiang
    Ngo, Chong-Wah
    IEEE MULTIMEDIA, 2011, 18 (01) : 38 - 51
  • [10] ERA: A Data Set and Deep Learning Benchmark for Event Recognition in Aerial Videos [Software and Data Sets]
    Mou, Lichao
    Hua, Yuansheng
    Jin, Pu
    Zhu, Xiao Xiang
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2020, 8 (04) : 125 - 133