DVANet: Disentangling View and Action Features for Multi-View Action Recognition

被引：0

作者：

Siddiqui, Nyle ^{[1
]}

Tirupattur, Praveen ^{[1
]}

Shah, Mubarak ^{[1
]}

机构：

[1] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32816 USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video. When trying to classify action instances captured from multiple viewpoints, there is a higher degree of difficulty due to the difference in background, occlusion, and visibility of the captured action from different camera angles. To tackle the various problems introduced in multi-view action recognition, we propose a novel configuration of learnable transformer decoder queries, in conjunction with two supervised contrastive losses, to enforce the learning of action features that are robust to shifts in viewpoints. Our disentangled feature learning occurs in two stages: the transformer decoder uses separate queries to separately learn action and view information, which are then further disentangled using our two contrastive losses. We show that our model and method of training significantly outperforms all other uni-modal models on four multi-view action recognition datasets: NTU RGB+D, NTU RGB+D 120, PKU-MMD, and N-UCLA. Compared to previous RGB works, we see maximal improvements of 1.5%, 4.8%, 2.2%, and 4.8% on each dataset, respectively. Our code can be found here: https://github.com/NyleSiddiqui/MultiView Actions

引用

页码：4873 / 4881

页数：9

共 50 条

[1] Automatic Multi-view Action Recognition with Robust Features
Chou, Kuang-Pen
Prasad, Mukesh
Li, Dong-Lin
Bharill, Neha
Lin, Yu-Feng
Hussain, Farookh
Lin, Chin-Teng
Lin, Wen-Chieh
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 554 - 563
[2] Multi-view representation learning for multi-view action recognition
Hao, Tong
Wu, Dan
Wang, Qian
Sun, Jin-Sheng
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 453 - 460
[3] Jointly Learning Multi-view Features for Human Action Recognition
Wang, Ruoshi
Liu, Zhigang
Yin, Ziyang
[J]. PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4858 - 4861
[4] View knowledge transfer network for multi-view action recognition
Liang, Zixi
Yin, Ming
Gao, Junli
He, Yicheng
Huang, Weitian
[J]. IMAGE AND VISION COMPUTING, 2022, 118
[5] Generative Multi-View Human Action Recognition
Wang, Lichen
Ding, Zhengming
Tao, Zhiqiang
Liu, Yunyu
Fu, Yun
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6221 - 6230
[6] Continuous Multi-View Human Action Recognition
Wang, Qiang
Sun, Gan
Dong, Jiahua
Wang, Qianqian
Ding, Zhengming
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3603 - 3614
[7] Multi-view human action recognition: A survey
Iosifidis, Alexandros
Tefas, Anastasios
Pitas, Ioannis
[J]. 2013 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2013), 2013, : 522 - 525
[8] Multi-View Super Vector for Action Recognition
Cai, Zhuowei
Wang, Limin
Peng, Xiaojiang
Qiao, Yu
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 596 - 603
[9] Human action recognition using multi-view image sequences features
Ahmad, Mohiuddin
Lee, Seong-Whan
[J]. PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION - PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE, 2006, : 523 - +
[10] HUMAN ACTION RECOGNITION BASED ON BAG OF FEATURES AND MULTI-VIEW NEURAL NETWORKS
Iosifidis, Alexandros
Tefas, Anastasios
Pitas, Ioannis
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1510 - 1514

← 1 2 3 4 5 →