A component-based video content representation for action recognition

被引：9

作者：

Adeli, Vida ^{[1
]}

Fazl-Ersi, Ehsan ^{[1
]}

Harati, Ahad ^{[1
]}

机构：

[1] Ferdowsi Univ Mashhad, Dept Comp Engn, Mashhad 9177948944, Razavi Khorasan, Iran

来源：

IMAGE AND VISION COMPUTING | 2019年 / 90卷

关键词：

Actionness likelihood; Action recognition; Action components; LSTM; Three-stream convolutional neural network; MOTION REPRESENTATION;

D O I：

10.1016/j.imavis.2019.08.009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the challenging problem of action recognition in videos and proposes a new component-based approach for video content representation. Although satisfactory performance for action recognition has already been obtained for certain scenarios, many of the existing solutions require fully-annotated video datasets in which region of the activity in each frame is specified by a bounding box. Another group of methods require auxiliary techniques to extract human-related areas in the video frames before being able to accurately recognize actions. In this paper, a Weakly-Supervised Learning (WSL) framework is introduced that eliminates the need for per-frame annotations and learns video representations that improve recognition accuracy and also highlights the activity related regions within each frame. To this end, two new representation ideas are proposed, one focus on representing the main components of an action, i.e. actionness regions, and the other focus on encoding the background context to represent general and holistic cues. A three-stream CNN is developed, which takes the two proposed representations and combines them with a motion-encoding stream. Temporal cues in each of the three different streams are modeled through LSTM, and finally fully-connected neural network layers are used to fuse various streams and produce the final video representation. Experimental results on four challenging datasets, demonstrate that the proposed Component-based Multi-stream CNN model (CM-CNN), trained on a WSL setting, outperforms the state-of-the-art in action recognition, even the fully-supervised approaches. (C) 2019 Elsevier B.V. All rights reserved.

引用

下载

页数：15

共 50 条

[41] Component-based cascade linear discriminant analysis for face recognition
Zhang, Wenchao
Shan, Shiguang
Gao, Wen
Chang, Yizheng
Cao, Bo
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3338 : 288 - 295
[42] Deeply Learned Invariant Features for Component-based Facial Recognition
Hassan, Adam
Viriri, Serestina
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 636 - 644
[43] A fuzzy video content representation for video summarization and content-based retrieval
Doulamis, AD
Doulamis, ND
Kollias, SD
SIGNAL PROCESSING, 2000, 80 (06) : 1049 - 1067
[44] Component-based cascade linear discriminant analysis for face recognition
Zhang, WC
Shan, SG
Gao, W
Chang, YZ
Cao, B
ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2004, 3338 : 288 - 295
[45] Component-Based Face Recognition using CNN for Forensic Application
Bulbule, Sampada S.
Sutaone, Mukul S.
Vyas, Vibha
2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
[46] Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection
Wang, Haoran
Yuan, Chunfeng
Hu, Weiming
Ling, Haibin
Yang, Wankou
Sun, Changyin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (02) : 570 - 581
[47] A Component-Based Content Distribution Network Architecture in Cloud Environments
Emami, Mohsen
Etemadi, Vahid
Bushehrian, Omid
2015 2ND INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED ENGINEERING AND INNOVATION (KBEI), 2015, : 329 - 334
[48] Component-based Software Architecture Applied for Design of Heritage Content
Iliev, Oleg
Yoshinov, Radoslav
DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2021, 11 : 99 - 110
[49] Explicit representation of exception handling in the development of dependable component-based systems
Ferreira, GRM
Rubira, CMF
de Lemos, R
SIXTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH ASSURANCE SYSTEMS ENGINEERING, 2001, : 182 - 193
[50] Efficinet video summarization based on a fuzzy video content representation
Doulamis, AD
Doulamis, ND
Kollias, SD
ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 301 - 304

← 1 2 3 4 5 →