Discriminative Segment Focus Network for Fine-grained Video Action Recognition

被引：0

作者：

Sun, Baoli ^{[1
]}

Ye, Xinchen ^{[2
]}

Yan, Tiantian ^{[3
]}

Wang, Zhihui ^{[2
]}

Li, Haojie ^{[4
]}

Wang, Zhiyong ^{[5
]}

机构：

[1] Dalian Univ Technol, Dalian, Liaoning, Peoples R China

[2] Dalian Univ Technol, DUT RU Int Sch Informat Sci & Engn, Dalian, Liaoning, Peoples R China

[3] Dalian Univ, Natl & Local Joint Engn Lab Comp Aided Design, Dalian, Liaoning, Peoples R China

[4] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Shandong, Peoples R China

[5] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 07期

关键词：

Fine-grained action recognition; discriminative segment; correlation;

D O I：

10.1145/3654671

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Fine-grained video action recognition aims at identifying minor and discriminative variations among fine categories of actions. While many recent action recognition methods have been proposed to better model spatio-temporal representations, how to model the interactions among discriminative atomic actions to effectively characterize inter-class and intra-class variations has been neglected, which is vital for understanding fine-grained actions. In this work, we devise a Discriminative Segment Focus Network (DSFNet) to mine the discriminability of segment correlations and localize discriminative action-relevant segments for fine-grained video action recognition. Firstly, we propose a hierarchic correlation reasoning (HCR) module which explicitly establishes correlations between different segments at multiple temporal scales and enhances each segment by exploiting the correlations with other segments. Secondly, a discriminative segment focus (DSF) module is devised to localize the most action-relevant segments fromthe enhanced representations of HCR by enforcing the consistency between the discriminability and the classification confidence of a given segment with a consistency constraint. Finally, these localized segment representations are combined with the global action representation of the whole video for boosting final recognition. Extensive experimental results on two fine-grained action recognition datasets, i.e., FineGym and Diving48, and two action recognition datasets, i.e., Kinetics400 and Something-Something, demonstrate the effectiveness of our approach compared with the state-of-the-art methods.

引用

页数：20

共 50 条

[41] JOINT LEARNING ON THE HIERARCHY REPRESENTATION FOR FINE-GRAINED HUMAN ACTION RECOGNITION
Leong, Mei Chee
Tan, Hui Li
Zhang, Haosong
Li, Liyuan
Lin, Feng
Lim, Joo Hwee
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1059 - 1063
[42] Fine-grained action recognition using multi-view attentions
Yisheng Zhu
Guangcan Liu
[J]. The Visual Computer, 2020, 36 : 1771 - 1781
[43] Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Munro, Jonathan
Damen, Dima
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 119 - 129
[44] Human Action Recognition Using Deep Data: A Fine-Grained Study
Rao, D. Surendra
Potturu, Sudharsana Rao
Bhagyaraju, V
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (06): : 97 - 108
[45] Fine-grained action recognition using multi-view attentions
Zhu, Yisheng
Liu, Guangcan
[J]. VISUAL COMPUTER, 2020, 36 (09): : 1771 - 1781
[46] Temporal and Fine-Grained Pedestrian Action Recognition on Driving Recorder Database
Kataoka, Hirokatsu
Satoh, Yutaka
Aoki, Yoshimitsu
Oikawa, Shoko
Matsui, Yasuhiro
[J]. SENSORS, 2018, 18 (02)
[47] Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization
He, Xiangteng
Peng, Yuxin
Zhao, Junjie
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (09) : 1235 - 1255
[48] Fine-grained Action Recognition with Robust Motion Representation Decoupling and Concentration
Sun, Baoli
Ye, Xinchen
Yan, Tiantian
Wang, Zhihui
Li, Haojie
Wang, Zhiyong
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4779 - 4788
[49] Multi-Modal Domain Adaptation for Fine-grained Action Recognition
Munro, Jonathan
Damen, Dima
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3723 - 3726
[50] Fine-grained action recognition of boxing punches from depth imagery
Kasiri, Soudeh
Fookes, Clinton
Sridharan, Sridha
Morgan, Stuart
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 159 : 143 - 153

← 1 2 3 4 5 →