Learning Generalized Representations for Open-Set Temporal Action Localization

被引:0
|
作者
Hu, Junshan [1 ]
Zhuang, Liansheng [1 ]
Dong, Weisong [1 ]
Ge, Shiming [2 ]
Wang, Shafei [3 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Guangdong, Peoples R China
关键词
Video understanding; open-set temporal action localization; Transformer; generalization;
D O I
10.1145/3581783.3612278
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-set Temporal Action Localization (OSTAL) is a critical and challenging task that aims to recognize and temporally localize human actions in untrimmed videos in open word scenarios. The main challenge in this task is the knowledge transfer from known actions to unknown actions. However, existing methods utilize limited training data and overparameterized deep neural network, which have poor generalization. This paper proposes a novel Generalized OSTAL model (namely GOTAL) to learn generalized representations of actions. GOTAL utilizes a Transformer network to model actions and a open-set detection head to perform action localization and recognition. Benefitting from Transformer's temporal modeling capabilities, GOTAL facilitates the extraction of human motion information from videos to mitigate the effects of irrelevant background data. Furthermore, a sharpness minimization algorithm is used to learn the network parameters of GOTAL, which facilitates the convergence of network parameters towards flatter minima by simultaneously minimizing the training loss value and sharpness of the loss plane. The collaboration of the above components significantly enhances the generalization of the representation. Experimental results demonstrate that GOTAL achieves the state-of-the-art performance on THUMOS14 and ActivityNet1.3 benchmarks, confirming the effectiveness of our proposed method.
引用
收藏
页码:1987 / 1996
页数:10
相关论文
共 50 条
  • [1] Joint Feature Generation and Open-set Prototype Learning for generalized zero-shot open-set classification
    Li, Xiao
    Fang, Min
    Zhai, Zhibo
    PATTERN RECOGNITION, 2024, 147
  • [2] Learning Bounds for Open-Set Learning
    Fang, Zhen
    Lu, Jie
    Liu, Anjin
    Liu, Feng
    Zhang, Guangquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Learning Disentangled Classification and Localization Representations for Temporal Action Localization
    Zhu, Zixin
    Wang, Le
    Tang, Wei
    Liu, Ziyi
    Zheng, Nanning
    Hua, Gang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3644 - 3652
  • [4] OpenTAL: Towards Open Set Temporal Action Localization
    Bao, Wentao
    Yu, Qi
    Kong, Yu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2969 - 2979
  • [5] Deep metric learning for open-set human action recognition in videos
    Gutoski, Matheus
    Lazzaretti, Andre Eugenio
    Lopes, Heitor Silverio
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04): : 1207 - 1220
  • [6] Deep metric learning for open-set human action recognition in videos
    Matheus Gutoski
    André Eugênio Lazzaretti
    Heitor Silvério Lopes
    Neural Computing and Applications, 2021, 33 : 1207 - 1220
  • [7] Entropic Open-Set Active Learning
    Safaei, Bardia
    Vibashan, V. S.
    de Melo, Celso M.
    Patel, Vishal M.
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4686 - 4694
  • [8] Learning Placeholders for Open-Set Recognition
    Zhou, Da-Wei
    Ye, Han-Jia
    Zhan, De-Chuan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4399 - 4408
  • [9] Active Learning for Open-set Annotation
    Ning, Kun-Peng
    Zhao, Xun
    Li, Yu
    Huang, Sheng-Jun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 41 - 49
  • [10] OPEN-SET RECOGNITION WITH GRADIENT-BASED REPRESENTATIONS
    Lee, Jinsol
    AlRegib, Ghassan
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 469 - 473