Hybrid embedding for multimodal few-frame action recognition

被引：0

作者：

Shafizadegan, Fatemeh ^{[1
]}

Naghsh-Nilchi, Ahmad Reza ^{[1
]}

Shabaninia, Elham ^{[2
]}

机构：

[1] Univ Isfahan, Fac Comp Engn, Dept Artificial Intelligence Engn, Esfahan, Iran

[2] Grad Univ Adv Technol, Fac Sci & Modern Technol, Dept Appl Math, Kerman, Iran

来源：

MULTIMEDIA SYSTEMS | 2025年 / 31卷 / 02期

关键词：

Action recognition; Vision transformer; Few-frame; Hybrid embedding;

D O I：

10.1007/s00530-025-01676-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, action recognition has witnessed significant advancements. However, most existing approaches heavily depend on the availability of large amounts of video data, which can be computationally expensive and time-consuming to process especially in real-time applications with limited computational resources. Utilizing too few frames instead, may lead to the loss of crucial information. Therefore, selecting a few frames in a way that preserves essential information poses a challenge. To address this issue, this paper proposes a novel video clip embedding technique called Hybrid Embedding. This technique combines the advantages of uniform frame sampling and tubelet embedding to enhance recognition with few frames. By employing a transformer-based architecture, the approach captures both spatial and temporal information from limited video frames. Furthermore, a keyframe extraction method is introduced to select more informative and diverse frames, which is crucial when only a few frames are available. In addition, the region of interest (ROI) in each RGB frame is cropped using skeletal data to enhance spatial attention. The study also explores the impact of the number of frames, different modalities, various transformer models, and the effect of pretraining in few-frame human action recognition. Experimental results demonstrate the effectiveness of the proposed embedding technique in few-frame action recognition. These findings contribute to addressing the challenge of action recognition with limited frames and shed light on the potential of transformers in this domain.

引用

页数：20

共 50 条

[41] Recognition of action dynamics in fencing using multimodal cues
Malawski, Filip
Kwolek, Bogdan
IMAGE AND VISION COMPUTING, 2018, 75 : 1 - 10
[42] Landmark-based multimodal human action recognition
Asteriadis, Stylianos
Daras, Petros
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 4505 - 4521
[43] Multimodal action recognition: a comprehensive survey on temporal modeling
Shabaninia, Elham
Nezamabadi-pour, Hossein
Shafizadegan, Fatemeh
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 59439 - 59489
[44] Human action recognition using Kinect multimodal information
Tang, Chao
Zhang, Miao-hui
Wang, Xiao-feng
Li, Wei
Cao, Feng
Hu, Chun-ling
GLOBAL INTELLIGENCE INDUSTRY CONFERENCE (GIIC 2018), 2018, 10835
[45] System for multimodal data acquisition for human action recognition
Filip Malawski
Jakub Gałka
Multimedia Tools and Applications, 2018, 77 : 23825 - 23850
[46] System for multimodal data acquisition for human action recognition
Malawski, Filip
Galka, Jakub
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23825 - 23850
[47] Adaptive Multimodal Fusion for Facial Action Units Recognition
Yang, Huiyuan
Wang, Taoyue
Yin, Lijun
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2982 - 2990
[48] Distillation Multiple Choice Learning for Multimodal Action Recognition
Garcia, Nuno Cruz
Bargal, Sarah Adel
Ablavsky, Vitaly
Morerio, Pietro
Murino, Vittorio
Sclaroff, Stan
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2754 - 2763
[49] Multimodal Multipart Learning for Action Recognition in Depth Videos
Shahroudy, Amir
Ng, Tian-Tsong
Yang, Qingxiong
Wang, Gang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 2123 - 2129
[50] Multimodal emotion recognition based on peak frame selection from video
Zhalehpour, Sara
Akhtar, Zahid
Erdem, Cigdem Eroglu
SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (05) : 827 - 834

← 1 2 3 4 5 →