Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

被引：14

作者：

Li, Tianjiao ^{[1
]}

Foo, Lin Geng ^{[1
]}

Ke, Qiuhong ^{[2
]}

Rahmani, Hossein ^{[3
]}

Wang, Anran ^{[4
]}

Wang, Jinghua ^{[5
]}

Liu, Jun ^{[1
]}

机构：

[1] Singapore Univ Technol & Design, ISTD Pillar, Singapore, Singapore

[2] Monash Univ, Dept Data Sci & AI, Melbourne, Vic, Australia

[3] Univ Lancaster, Sch Comp & Commun, Lancaster, England

[4] ByteDance, Beijing, Peoples R China

[5] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

来源：

COMPUTER VISION - ECCV 2022, PT IV | 2022年 / 13664卷

基金：

新加坡国家研究基金会;

关键词：

Action recognition; Fine-grained; Dynamic neural networks; HUMAN NEURAL SYSTEM; FACE; REPRESENTATIONS; IDENTITY;

D O I：

10.1007/978-3-031-19772-7_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets.

引用

页码：386 / 403

页数：18

共 50 条

[21] Spatio-Temporal Contrastive Learning for Compositional Action Recognition
Gong, Yezi
Pei, Mingtao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 424 - 438
[22] Fine-Grained Action Recognition Based on Temporal Pyramid Excitation Network
Zhou, Xuan
Yi, Jianping
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02): : 2103 - 2116
[23] Temporal and Fine-Grained Pedestrian Action Recognition on Driving Recorder Database
Kataoka, Hirokatsu
Satoh, Yutaka
Aoki, Yoshimitsu
Oikawa, Shoko
Matsui, Yasuhiro
SENSORS, 2018, 18 (02)
[24] Modeling fine-grained spatio-temporal pollution maps with low-cost sensors
Iyer, Shiva R.
Balashankar, Ananth
Aeberhard, William H.
Bhattacharyya, Sujoy
Rusconi, Giuditta
Jose, Lejo
Soans, Nita
Sudarshan, Anant
Pande, Rohini
Subramanian, Lakshminarayanan
NPJ CLIMATE AND ATMOSPHERIC SCIENCE, 2022, 5 (01)
[25] Fine-Grained Vessel Traffic Flow Prediction With a Spatio-Temporal Multigraph Convolutional Network
Liang, Maohan
Liu, Ryan Wen
Zhan, Yang
Li, Huanhuan
Zhu, Fenghua
Wang, Fei-Yue
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 23694 - 23707
[26] Modeling fine-grained spatio-temporal pollution maps with low-cost sensors
Shiva R. Iyer
Ananth Balashankar
William H. Aeberhard
Sujoy Bhattacharyya
Giuditta Rusconi
Lejo Jose
Nita Soans
Anant Sudarshan
Rohini Pande
Lakshminarayanan Subramanian
npj Climate and Atmospheric Science, 5
[27] Action Recognition Using a Spatio-Temporal Model in Dynamic Scenes
Chathuramali, K. G. Manosha
Rodrigo, Ranga
2014 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2014,
[28] Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks Application to table tennis
Martin, Pierre-Etienne
Benois-Pineau, Jenny
Peteri, Renaud
Morlier, Julien
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 20429 - 20447
[29] Accelerated Learning of Discriminative Spatio-temporal Features for Action Recognition
Varshney, Munender
Rameshan, Renu
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[30] Supervised Spatio-Temporal Neighborhood Topology Learning for Action Recognition
Ma, Andy J.
Yuen, Pong C.
Zou, Wilman W. W.
Lai, Jian-Huang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2013, 23 (08) : 1447 - 1460

← 1 2 3 4 5 →