Hybrid embedding for multimodal few-frame action recognition

被引:0
|
作者
Shafizadegan, Fatemeh [1 ]
Naghsh-Nilchi, Ahmad Reza [1 ]
Shabaninia, Elham [2 ]
机构
[1] Univ Isfahan, Fac Comp Engn, Dept Artificial Intelligence Engn, Esfahan, Iran
[2] Grad Univ Adv Technol, Fac Sci & Modern Technol, Dept Appl Math, Kerman, Iran
关键词
Action recognition; Vision transformer; Few-frame; Hybrid embedding;
D O I
10.1007/s00530-025-01676-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, action recognition has witnessed significant advancements. However, most existing approaches heavily depend on the availability of large amounts of video data, which can be computationally expensive and time-consuming to process especially in real-time applications with limited computational resources. Utilizing too few frames instead, may lead to the loss of crucial information. Therefore, selecting a few frames in a way that preserves essential information poses a challenge. To address this issue, this paper proposes a novel video clip embedding technique called Hybrid Embedding. This technique combines the advantages of uniform frame sampling and tubelet embedding to enhance recognition with few frames. By employing a transformer-based architecture, the approach captures both spatial and temporal information from limited video frames. Furthermore, a keyframe extraction method is introduced to select more informative and diverse frames, which is crucial when only a few frames are available. In addition, the region of interest (ROI) in each RGB frame is cropped using skeletal data to enhance spatial attention. The study also explores the impact of the number of frames, different modalities, various transformer models, and the effect of pretraining in few-frame human action recognition. Experimental results demonstrate the effectiveness of the proposed embedding technique in few-frame action recognition. These findings contribute to addressing the challenge of action recognition with limited frames and shed light on the potential of transformers in this domain.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Embedding Motion and Structure Features for Action Recognition
    Zhen, Xiantong
    Shao, Ling
    Tao, Dacheng
    Li, Xuelong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2013, 23 (07) : 1182 - 1190
  • [22] MERSA: Multimodal Emotion Recognition with Self-Align Embedding
    Quan Bao Le
    Kiet Tuan Trinh
    Nguyen Dinh Hung Son
    Phuong-Nam Tran
    Cuong Tuan Nguyen
    Duc Ngoc Minh Dang
    38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 500 - 505
  • [23] Exploring Multimodal Video Representation for Action Recognition
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1924 - 1931
  • [24] MKTZ: multi-semantic embedding and key frame masking techniques for zero-shot skeleton action recognition
    Chen, Hongwei
    Guo, Sheng
    Chen, Zexi
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [25] Frame-Level Embedding Learning for Few-shot Bioacoustic Event Detection
    Zhang, Xueyang
    Wang, Shuxian
    Du, Jun
    Yan, Genwei
    Tang, Jigang
    Gao, Tian
    Fang, Xin
    Pan, Jia
    Gao, Jianqing
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 750 - 755
  • [26] Action recognition using exemplar-based embedding
    Weinland, Daniel
    Boyer, Edmond
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 3033 - 3039
  • [27] Automatic Human Action Recognition in Videos by Graph Embedding
    Borzeshi, Ehsan Zare
    Xu, Richard
    Piccardi, Massimo
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2011, PT II, 2011, 6979 (II): : 19 - 28
  • [28] Embedding Sequential Information into Spatiotemporal Features for Action Recognition
    Ye, Yuancheng
    Tian, Yingli
    PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 1110 - 1118
  • [29] Developing Motion Code Embedding for Action Recognition in Videos
    Alibayev, Maxat
    Paulius, David
    Sun, Yu
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7529 - 7536
  • [30] ProtoGAN: Towards Few Shot Learning for Action Recognition
    Dwivedi, Sai Kumar
    Gupta, Vikram
    Mitra, Rahul
    Ahmed, Shuaib
    Jain, Arjun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1308 - 1316