Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition

被引:0
|
作者
Dai, Guangzhao [1 ]
Shu, Xiangbo [1 ]
Yan, Rui [2 ]
Huang, Peng [1 ]
Tang, Jinhui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ, Nanjing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金; 中国博士后科学基金;
关键词
Egocentric Action Recognition; Prototype Learning; Video Understanding;
D O I
10.1145/3581783.3612144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric Action Recognition (EAR) is required to recognize both the interacting objects (noun) and the motion (verb) against cluttered backgrounds with distracting objects. For capturing interacting objects, traditional approaches heavily rely on luxury object annotations or detectors, though a few works heuristically enumerate the fixed sets of verb-constrained prototypes to roughly exclude the background. For capturing motion, the inherent variations of motion duration among egocentric videos with different lengths are almost ignored. To this end, we propose a novel Slowfast Diversity-aware Prototype learning (SDP) to effectively capture interacting objects by learning compact yet diverse prototypes, and adaptively capture motion in either long-time video or short-time video. Specifically, we present a new Part-to-Prototype (P2P) scheme to learn prototypes from raw videos covering the interacting objects by refining the semantic information from part level to prototype level. Moreover, for adaptively capturing motion, we design a new Slow-Fast Context (SFC) mechanism that explores the Up/Down augmentations for the prototype representation at the semantic level to strengthen the transient dynamic information in short-time videos and eliminate the redundant dynamic information in longtime videos, which are further fine-complemented via the slow-and fast-aware attentions. Extensive experiments demonstrate SDP outperforms state-of-the-art methods on two large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 and EGTEA.
引用
下载
收藏
页码:7549 / 7558
页数:10
相关论文
共 50 条
  • [1] Interactive Prototype Learning for Egocentric Action Recognition
    Wang, Xiaohan
    Zhu, Linchao
    Wang, Heng
    Yang, Yi
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8148 - 8157
  • [2] Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
    Jeon, Jaehyeong
    Kim, Kibum
    Yoon, Kanghoon
    Park, Chanyoung
    arXiv,
  • [3] Federated Meta-Learning with Attention for Diversity-Aware Human Activity Recognition
    Shen, Qiang
    Feng, Haotian
    Song, Rui
    Song, Donglei
    Xu, Hao
    SENSORS, 2023, 23 (03)
  • [4] DivBO: Diversity-aware CASH for Ensemble Learning
    Shen, Yu
    Lu, Yupeng
    Li, Yang
    Tu, Yaofeng
    Zhang, Wentao
    Cui, Bin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Diversity-Aware Batch Active Learning for Dependency Parsing
    Shi, Tianze
    Benton, Adrian
    Malioutov, Igor
    Irsoy, Ozan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2616 - 2626
  • [6] Deep Metric Learning for Human Action Recognition with SlowFast Networks
    Shi, Shanmeng
    Jung, Cheolkon
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [7] Diversity-aware population modeling
    Nature Computational Science, 2025, 5 (3): : 194 - 195
  • [8] Diversity-Aware Label Distribution Learning for Microscopy Auto Focusing
    Zhang, Chuyan
    Gu, Yun
    Yang, Jie
    Yang, Guang-Zhong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 1942 - 1949
  • [9] Diversity-aware retrieval of medical records
    Li, Jianqiang
    Liu, Chunchen
    Liu, Bo
    Mao, Rui
    Wang, Yongcai
    Chen, Shi
    Yang, Ji-Jiang
    Pan, Hui
    Wang, Qing
    COMPUTERS IN INDUSTRY, 2015, 69 : 81 - 91
  • [10] Learning Spatiotemporal Attention for Egocentric Action Recognition
    Lu, Minlong
    Liao, Danping
    Li, Ze-Nian
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4425 - 4434