SUM-MAX VIDEO POOLING FOR COMPLEX EVENT RECOGNITION

被引:0
|
作者
Phan, Sang [1 ,2 ]
Duy-Dinh Le [2 ,3 ]
Satoh, Shin'ichi [2 ]
机构
[1] Grad Univ Adv Studies SOKENDAI, Hayama, Japan
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Informat Technol, Multimedia Commun Lab, Ho Chi Minh, Vietnam
关键词
video representation; sum-pooling; max-pooling; sum-max video pooling; multimedia event detection; FEATURES;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A video can be viewed as a layered structure where the lowest layer are frames, the top layer is the entire video, and the middle layers are the sequences of consecutive frames or the concatenation of lower layers. While it is easy to find local discriminative features in video from lower layers, it is non-trivial to aggregate these features into a discriminative video representation. In literature, people often use sum pooling to obtain reasonable recognition performance on artificial videos. However, the sum pooling technique does not work well on complex videos because the region of interests may reside within some middle layers. In this paper, we leverage the layered structure of video to propose a new pooling method, named sum-max video pooling, to handle this problem. Basically, we apply sum pooling at the low layer representation while using max pooling at the high layer representation. Sum pooling is used to keep sufficient relevant features at the low layer, while max pooling is used to retrieve the most relevant features at the high layer, therefore it can discard irrelevant features in the final video representation. Experimental results on the TRECVID Multimedia Event Detection 2010 dataset shows the effectiveness of our method.
引用
收藏
页码:1026 / 1030
页数:5
相关论文
共 50 条
  • [41] MCMP-Net: MLP combining max pooling network for sEMG gesture recognition
    Mian, Xiang
    Zhou, Bingtao
    Cheng, Shiqiang
    Song, Liu
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 90
  • [42] Hierarchical Event Representation and Recognition Method for Scalable Video Event Analysis
    Kwak, Suha
    Han, Joon Hee
    [J]. ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 586 - 591
  • [43] CORE: a Complex Event Recognition Engine
    Bucchi, Marco
    Grez, Alejandro
    Quintana, Andres
    Riveros, Cristian
    Vansummeren, Stijn
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (09): : 1951 - 1964
  • [44] COMPLEX EVENT RECOGNITION WITH UNCERTAINTY REASONING
    Liu, Xueqin
    Clawson, Kathy
    Wang, Hui
    Scotney, Bryan
    Liu, Jun
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1823 - 1828
  • [45] Probabilistic Complex Event Recognition: A Survey
    Alevizos, Elias
    Skarlatidis, Anastasios
    Artikis, Alexander
    Paliouras, Georgios
    [J]. ACM COMPUTING SURVEYS, 2017, 50 (05)
  • [46] A Formal Framework for Complex Event Recognition
    Grez, Alejandro
    Riveros, Cristian
    Ugarte, Martin
    Vansummeren, Stijn
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2021, 46 (04):
  • [47] Distribution and Uncertainty in Complex Event Recognition
    Artikis, Alexander
    Weidlich, Matthias
    [J]. RULE TECHNOLOGIES: FOUNDATIONS, TOOLS, AND APPLICATIONS, 2015, 9202 : 70 - 80
  • [48] GENERALIZED POOLING PYRAMID WITH HIERARCHICAL DICTIONARY SPARSE CODING FOR EVENT AND OBJECT RECOGNITION
    Chen, Shuai
    Ma, Bo
    Luo, Pei
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2349 - 2353
  • [49] Comparison of temporal pooling methods for estimating the quality of complex video sequences
    Rohaly, AM
    Lu, JH
    Franzen, NR
    Ravel, MK
    [J]. HUMAN VISION AND ELECTRONIC IMAGING IV, 1999, 3644 : 218 - 225
  • [50] Video understanding for complex activity recognition
    Florent Fusier
    Valéry Valentin
    François Brémond
    Monique Thonnat
    Mark Borg
    David Thirde
    James Ferryman
    [J]. Machine Vision and Applications, 2007, 18 : 167 - 188