Exploiting spatio-temporal knowledge for video action recognition

被引:3
|
作者
Zhang, Huigang [1 ]
Wang, Liuan [1 ]
Sun, Jun [1 ]
机构
[1] Fujitsu R&D Ctr, Beijing 100022, Peoples R China
关键词
action recognition; commonsense knowledge; GCN; STKM;
D O I
10.1049/cvi2.12154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition has been a popular area of computer vision research in recent years. The goal of this task is to recognise human actions in video frames. Most existing methods often depend on the visual features and their relationships inside the videos. The extracted features only represent the visual information of the current video itself and cannot represent the general knowledge of particular actions beyond the video. Thus, there are some deviations in these features, and the recognition performance still requires improvement. In this sudy, we present a novel spatio-temporal knowledge module (STKM) to endow the current methods with commonsense knowledge. To this end, we first collect hybrid external knowledge from universal fields, which contains both visual and semantic information. Then graph convolution networks (GCN) are used to represent and aggregate this knowledge. The GCNs involve (i) a spatial graph to capture spatial relations and (ii) a temporal graph to capture serial occurrence relations among actions. By integrating knowledge and visual features, we can get better recognition results. Experiments on AVA, UCF101-24 and JHMDB datasets show the robustness and generalisation ability of STKM. The results report a new state-of-the-art 32.0 mAP on AVA v2.1. On UCF101-24 and JHMDB datasets, our method also improves by 1.5 AP and 2.6 AP, respectively, over the baseline method.
引用
收藏
页码:222 / 230
页数:9
相关论文
共 50 条
  • [21] STHARNet: spatio-temporal human action recognition network in content based video retrieval
    Sowmyayani, S.
    Rani, P. Arockia Jansi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 38051 - 38066
  • [22] Modeling and Exploiting the Spatio-temporal Facial Action Dependencies for Robust Spontaneous Facial Expression Recognition
    Tong, Yan
    Chen, Jixu
    Ji, Qiang
    [J]. 2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 793 - +
  • [23] Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation
    Li, Qing
    Qiu, Zhaofan
    Yao, Ting
    Mei, Tao
    Rui, Yong
    Luo, Jiebo
    [J]. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 159 - 166
  • [24] Spatio-Temporal Information for Action Recognition in Thermal Video Using Deep Learning Model
    Srihari, P.
    Harikiran, J.
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2022, 13 (08) : 669 - 680
  • [25] Spatio-temporal adaptive convolution and bidirectional motion difference fusion for video action recognition
    Li, Linxi
    Tang, Mingwei
    Yang, Zhendong
    Hu, Jie
    Zhao, Mingfeng
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [26] STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition
    Dasom Ahn
    Sangwon Kim
    Byoung Chul Ko
    [J]. Applied Intelligence, 2023, 53 : 28446 - 28459
  • [27] STHARNet: spatio-temporal human action recognition network in content based video retrieval
    S. Sowmyayani
    P. Arockia Jansi Rani
    [J]. Multimedia Tools and Applications, 2023, 82 : 38051 - 38066
  • [28] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Xuemin Lu
    Wei Quan
    Reformat Marek
    Haiquan Zhao
    Jim X. Chen
    [J]. The Visual Computer, 2024, 40 : 3163 - 3181
  • [29] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Lu, Xuemin
    Quan, Wei
    Marek, Reformat
    Zhao, Haiquan
    Chen, Jim X. X.
    [J]. VISUAL COMPUTER, 2024, 40 (05): : 3163 - 3181
  • [30] INDEXED SPATIO-TEMPORAL APPEARANCE MODELS FOR QUERY-DRIVEN VIDEO ACTION RECOGNITION
    Zheng, Haomian
    Li, Zhu
    Katsaggelos, Aggelos K.
    You, Jia
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,