Multi-scale feature learning and temporal probing strategy for one-stage temporal action localization

被引:0
|
作者
Yao, Leiyue [1 ,2 ]
Yang, Wei [1 ]
Huang, Wei [2 ]
Jiang, Nan [3 ]
Zhou, Bingbing [1 ]
机构
[1] Jiangxi Univ Technol, Sch Informat Engn, 99 ZiYang Rd, Nanchang 330098, Jiangxi, Peoples R China
[2] Nanchang Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
[3] East China Jiaotong Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
action encoding; motion data structure; skeleton-based action recognition; temporal action localization; temporal window merging strategy; HUMAN ACTION RECOGNITION; ATTENTION; VIDEOS;
D O I
10.1002/int.22713
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of temporal action localization (TAL) is to determine the start and end frames of an action in a video. In recent years, TAL has attracted considerable attention because of its increasing applications in video understanding and retrieval. However, precisely estimating the duration of an action in the temporal dimension is still a challenging problem. In this paper, we propose an effective one-stage TAL method based on a self-defined motion data structure, called a dense joint motion matrix (DJMM), and a novel temporal detection strategy. Our method provides three main contributions. First, compared with mainstream motion images, DJMMs can preserve more pre-processed motion features and provides more precise detail representations. Furthermore, DJMMs perfectly solve the temporal information loss problem caused by motion trajectory overlaps within a certain time period. Second, a spatial pyramid pooling (SPP) layer, which is widely used in the object detection and tracking fields, is innovatively incorporated into the proposed method for multi-scale feature learning. Moreover, the SPP layer enables the backbone convolutional neural network (CNN) to receive DJMMs of any size in the temporal dimension. Third, a large-scale-first temporal detection strategy inspired by a well-developed Chinese text segmentation algorithm is proposed to address long-duration videos. Our method is evaluated on two benchmark data sets and one self-collected data set: Florence-3D, UTKinect-Action3D and HanYue-3D. The experimental results show that our method achieves competitive action recognition accuracy and high TAL precision, and its time efficiency and few-shot learning capabilities enable it to be utilized for real-time surveillance.
引用
收藏
页码:4092 / 4112
页数:21
相关论文
共 50 条
  • [1] One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
    Nguyen, Trung Thanh
    Kawanishi, Yasutomo
    Komamizu, Takahiro
    Ide, Ichiro
    [J]. 2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [2] Gated Multi-Scale Transformer for Temporal Action Localization
    Yang, Jin
    Wei, Ping
    Ren, Ziyang
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5705 - 5717
  • [3] Feature Pyramid Hierarchies for Multi-scale Temporal Action Detection
    He, Jiayu
    Li, Guohui
    Lei, Jun
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2158 - 2165
  • [4] MULTI-SCALE TEMPORAL FEATURE FUSION FOR FEW-SHOT ACTION RECOGNITION
    Lee, Jun-Tae
    Yun, Sungrack
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1785 - 1789
  • [5] Multi-scale temporal feature-based dense convolutional network for action recognition
    Li, Xiaoqiang
    Xie, Miao
    Zhang, Yin
    Li, Jide
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (06)
  • [6] Movement Enhancement toward Multi-Scale Video Feature Representation for Temporal Action Detection
    Zhao, Zixuan
    Wang, Dongqi
    Zhao, Xu
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13509 - 13518
  • [7] An improved one-stage pedestrian detection method based on multi-scale attention feature extraction
    Ma, Jun
    Wan, Honglin
    Wang, Junxia
    Xia, Hao
    Bai, Chengjie
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (06) : 1965 - 1978
  • [8] An improved one-stage pedestrian detection method based on multi-scale attention feature extraction
    Jun Ma
    Honglin Wan
    Junxia Wang
    Hao Xia
    Chengjie Bai
    [J]. Journal of Real-Time Image Processing, 2021, 18 : 1965 - 1978
  • [9] One-Stage Disease Detection Method for Maize Leaf Based on Multi-Scale Feature Fusion
    Li, Ying
    Sun, Shiyu
    Zhang, Changshe
    Yang, Guangsong
    Ye, Qiubo
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [10] Multi-scale Dynamic Network for Temporal Action Detection
    Ren, Yifan
    Xu, Xing
    Shen, Fumin
    Wang, Zheng
    Yang, Yang
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 267 - 275