Towards efficient video-based action recognition: context-aware memory attention network

被引：2

作者：

Koh, Thean Chun ^{[1
]}

Yeo, Chai Kiat ^{[1
]}

Jing, Xuan ^{[1
,2
]}

Sivadas, Sunil ^{[2
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Sci & Engn, 50 Nanyang Ave, Singapore 639798, Singapore

[2] NCS Pte Ltd, Ang Mo Kio St 62, Singapore 569141, Singapore

来源：

SN APPLIED SCIENCES | 2023年 / 5卷 / 12期

关键词：

Action recognition; Deep learning; Convolutional neural network; Attention; BIDIRECTIONAL LSTM; CLASSIFICATION;

D O I：

10.1007/s42452-023-05568-5

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models.Article HighlightsRecent human action recognition models are not yet ready for practical applications due to high computation needs.We propose a 2D CNN-based human action recognition method to reduce the computation load.The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.

引用

页数：12

共 50 条

[41] Play and rewind: Context-aware video temporal action proposals
Gao, Lianli
Li, Tao
Song, Jingkuan
Zhao, Zhou
Shen, Heng Tao
PATTERN RECOGNITION, 2020, 107 (107)
[42] Context Sensing Attention Network for Video-based Person Re-identification
Wang, Kan
Ding, Changxing
Pang, Jianxin
Xu, Xiangmin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
[43] Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos
Zeng, Ling-An
Hong, Fa-Ting
Zheng, Wei-Shi
Yu, Qi-Zhi
Zeng, Wei
Wang, Yao-Wei
Lai, Jian-Huang
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2526 - 2534
[44] A Context Aware and Video-Based Risk Descriptor for Cyclists
Costa, Miguel
Ferreira, Beatriz Quintino
Marques, Manuel
2017 IEEE 20TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2017,
[45] Temporal Attention Quality Aware Network for Video-based Person Re-Identification
Xu, Boqin
Liu, Changhong
Xue, Shengjun
Jiang, Aiwen
Wang, Shimin
Ye, Jihua
TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
[46] A Dynamic and Static Context-Aware Attention Network for Trajectory Prediction
Yu, Jian
Zhou, Meng
Wang, Xin
Pu, Guoliang
Cheng, Chengqi
Chen, Bo
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (05)
[47] CAAN: Context-Aware attention network for visual question answering
Chen, Chongqing
Han, Dezhi
Chang, Chin-Chen
Pattern Recognition, 2022, 132
[48] Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design
Sang H.-F.
Zhao Z.-Y.
He D.-K.
Zhao, Zi-Yu (Maikuraky1022@outlook.com), 1600, Chinese Institute of Electronics (48): : 1052 - 1061
[49] Context-Aware Attention Network for Image-Text Retrieval
Zhang, Qi
Lei, Zhen
Zhang, Zhaoxiang
Li, Stan Z.
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3533 - 3542
[50] Context-aware Attention Network for Predicting Image Aesthetic Subjectivity
Xu, Munan
Zhong, Jia-Xing
Ren, Yurui
Liu, Shan
Li, Ge
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 798 - 806

← 1 2 3 4 5 →