SUM-MAX VIDEO POOLING FOR COMPLEX EVENT RECOGNITION

被引:0
|
作者
Phan, Sang [1 ,2 ]
Duy-Dinh Le [2 ,3 ]
Satoh, Shin'ichi [2 ]
机构
[1] Grad Univ Adv Studies SOKENDAI, Hayama, Japan
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Informat Technol, Multimedia Commun Lab, Ho Chi Minh, Vietnam
关键词
video representation; sum-pooling; max-pooling; sum-max video pooling; multimedia event detection; FEATURES;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A video can be viewed as a layered structure where the lowest layer are frames, the top layer is the entire video, and the middle layers are the sequences of consecutive frames or the concatenation of lower layers. While it is easy to find local discriminative features in video from lower layers, it is non-trivial to aggregate these features into a discriminative video representation. In literature, people often use sum pooling to obtain reasonable recognition performance on artificial videos. However, the sum pooling technique does not work well on complex videos because the region of interests may reside within some middle layers. In this paper, we leverage the layered structure of video to propose a new pooling method, named sum-max video pooling, to handle this problem. Basically, we apply sum pooling at the low layer representation while using max pooling at the high layer representation. Sum pooling is used to keep sufficient relevant features at the low layer, while max pooling is used to retrieve the most relevant features at the high layer, therefore it can discard irrelevant features in the final video representation. Experimental results on the TRECVID Multimedia Event Detection 2010 dataset shows the effectiveness of our method.
引用
收藏
页码:1026 / 1030
页数:5
相关论文
共 50 条
  • [1] Sum-max Submodular Bandits
    Pasteris, Stephen
    Rumi, Alberto
    Vitale, Fabio
    Cesa-Bianchi, Nicolo
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [2] On the sum-max graph partitioning problem
    Watrigant, Remi
    Bougeret, Marin
    Giroudeau, Rodolphe
    Koenig, Jean-Claude
    [J]. THEORETICAL COMPUTER SCIENCE, 2014, 540 : 143 - 155
  • [3] On the sum-max bicriterion path problem
    Pelegrin, B
    Fernandez, P
    [J]. COMPUTERS & OPERATIONS RESEARCH, 1998, 25 (12) : 1043 - 1054
  • [4] Dynamic Pooling for Complex Event Recognition
    Li, Weixin
    Yu, Qian
    Divakaran, Ajay
    Vasconcelos, Nuno
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2728 - 2735
  • [5] Scene Aligned Pooling for Complex Video Recognition
    Cao, Liangliang
    Mu, Yadong
    Natsev, Apostol
    Chang, Shih-Fu
    Hua, Gang
    Smith, John R.
    [J]. COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 688 - 701
  • [6] Generalized Max Pooling for Action Recognition
    Trang Nguyen
    Sang Phan
    Thanh Duc Ngo
    [J]. 2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 401 - 406
  • [7] Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks
    Huy Phan
    Hertel, Lars
    Maass, Marco
    Mertins, Alfred
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3653 - 3657
  • [8] Semantic Model Vectors for Complex Video Event Recognition
    Merler, Michele
    Huang, Bert
    Xie, Lexing
    Hua, Gang
    Natsev, Apostol
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (01) : 88 - 101
  • [9] Contextual Max Pooling for Human Action Recognition
    Zhang, Zhong
    Liu, Shuang
    Mei, Xing
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (04) : 989 - 993
  • [10] Max-margin adaptive model for complex video pattern recognition
    Litao Yu
    Jie Shao
    Xin-Shun Xu
    Heng Tao Shen
    [J]. Multimedia Tools and Applications, 2015, 74 : 505 - 521