Video-level Multi-model Fusion for Action Recognition

被引:3
|
作者
Wang, Xiaomin [1 ]
Zhang, Junsan [1 ]
Wang, Leiquan [1 ]
Yu, Philip S. [2 ]
Zhu, Jie [3 ]
Li, Haisheng [4 ]
机构
[1] China Univ Petr EastChina, Coll Comp Sci & Technol, Qingdao, Shandong, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA
[3] Natl Police Univ Criminal Justice, Dept Informat Management, Hangzhou, Peoples R China
[4] Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
action recognition; video-leval recognition; 3D convolution; multi-model fusion;
D O I
10.1145/3357384.3357935
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The approaches based on spatio-temporal features for video action recognition have emerged such as two-stream based methods and 3D convolution based methods. However, current methods suffer from the problems caused by partial observation, or restricted to single information modeling, and so on. Segment-level recognition results obtained from dense sampling can not represent the entire video and, therefore lead to partial observation. And a single model is hard to capture the complementary information on spacial, temporal and spatio-temporal information from video at the same time. Therefore, the challenge is to build the video-level representation and capture multiple information. In this paper, a video-level multi-model fusion action recognition method is proposed to solve these problems. Firstly, an efficient video-level 3D convolution model is proposed to get the global information in the video which assembling segment-level 3D convolution models. Secondly, a multi-model fusion architecture is proposed for video action recognition to capture multiple information. The spatial, temporal and spatio-temporal information are aggregate with SVM classifier. Experimental results show that this method achieves the state-of-the-art performance on the datasets of UCF-101(97.6%) without pre-training on Kinetics.
引用
收藏
页码:159 / 168
页数:10
相关论文
共 50 条
  • [11] Human action recognition toward massive-scale sport sceneries based on deep multi-model feature fusion
    Zhou, Ersan
    Zhang, Heqing
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84
  • [12] Deep Multi-Model Fusion for Human Activity Recognition Using Evolutionary Algorithms
    Verma, Kamal Kant
    Singh, Brij Mohan
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 44 - 58
  • [13] American License Plate Recognition Algorithm Based on Deep Multi-Model Fusion
    Cai, Ying
    Zhang, Yuexin
    Huang, Jie
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6432 - 6437
  • [14] Human action identification by a quality-guided fusion of multi-model feature
    Bi, Zhuo
    Huang, Wenju
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 116 : 13 - 21
  • [15] Enhancing Word-Level Completion for Masked Language Model with Multi-Model Fusion
    Chang, Xinquan
    Zhu, Junguo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 41 - 53
  • [16] Multi-model fusion fault diagnosis
    Zhu, Ping
    Huang, Wenhu
    Jiang, Xingwei
    Yu, Baisheng
    Boyinhexige
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 1 - 3
  • [17] MLENet: Multi-Level Extraction Network for video action recognition
    Wang, Fan
    Li, Xinke
    Xiong, Han
    Mo, Haofan
    Li, Yongming
    PATTERN RECOGNITION, 2024, 154
  • [18] Human Action Recognition Based On Multi-level Feature Fusion
    Xu, Y. Y.
    Xiao, G. Q.
    Tang, X. Q.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 353 - 355
  • [19] Channel attention convolutional aggregation network based on video-level features for EEG emotion recognition
    Feng, Xin
    Cong, Ping
    Dong, Lin
    Xin, Yongxian
    Miao, Fengbo
    Xin, Ruihao
    COGNITIVE NEURODYNAMICS, 2024, 18 (04) : 1689 - 1707
  • [20] Language-guided Multi-Modal Fusion for Video Action Recognition
    Hsiao, Jenhao
    Li, Yikang
    Ho, Chiuman
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3151 - 3155