Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification

被引:0
|
作者
Wang, Bo [1 ]
Zhao, Kaili [1 ]
Zhao, Hongyang [1 ]
Pu, Shi
Xiao, Bo [1 ]
Guo, Jun [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
D O I
10.1109/ICCVW60793.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video attributes, which leverage video contents to instantiate class semantics, play a critical role in diversifying semantics in zero-shot video classification, thereby facilitating semantic transfer from seen to unseen classes. However, few presences discuss video attributes, and most methods consider class names as class semantics that tend to be loosely defined. In this paper, we propose a Video Attribute Prototype Network (VAPNet) to generate video attributes that learns in-context semantics between video captions and class semantics. Specifically, we introduce a cross-attention module in the Transformer decoder by considering video captions as queries to probe and pool semantic-associated class-wise features. To alleviate noises in pre-extracted captions, we learn caption features through a stochastic representation derived from a Gaussian representation where the variance encodes uncertainties. We utilize a joint video-to-attribute and video-to-video contrastive loss to calibrate visual and semantic features. Experiments show that VAPNet significantly outperforms SoTA by relative improvements of 14.3% on UCF101 and 8.8% on HMDB51, and further surpasses the pre-trained vision-language SoTA by 4.1% and 17.2%. Code is available.
引用
收藏
页码:315 / 324
页数:10
相关论文
共 50 条
  • [31] Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Yao, Yazhou
    Shen, Fumin
    Huang, Dan
    Huang, Xingguo
    Shen, Heng-Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2348 - 2359
  • [32] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
    Zhou, Tianfei
    Li, Jianwu
    Wang, Shunzhou
    Tao, Ran
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338
  • [33] Efficient and consistent zero-shot video generation with diffusion models
    Frakes, Ethan
    Khalid, Umar
    Chen, Chen
    REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
  • [34] Prompt-based Zero-shot Video Moment Retrieval
    Wang, Guolong
    Wu, Xun
    Liu, Zhaoyuan
    Yan, Junchi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [35] Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
    Pang, Zongshang
    Nakashima, Yuta
    Otani, Mayu
    Nagahara, Hajime
    JOURNAL OF IMAGING, 2024, 10 (09)
  • [36] Zero-Shot Video Grounding With Pseudo Query Lookup and Verification
    Lu, Yu
    Quan, Ruijie
    Zhu, Linchao
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1643 - 1654
  • [37] Dual Progressive Prototype Network for Generalized Zero-Shot Learning
    Wang, Chaoqun
    Mina, Shaobo
    Chenl, Xuejin
    Sun, Xiaoyan
    Li, Houqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] Attribute subspaces for zero-shot learning
    Zhou, Lei
    Liu, Yang
    Bai, Xiao
    Li, Na
    Yu, Xiaohan
    Zhou, Jun
    Hancock, Edwin R.
    PATTERN RECOGNITION, 2023, 144
  • [39] Zero-Shot Learning with Attribute Selection
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    Tang, Sheng
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6870 - 6877
  • [40] Attribute Distillation for Zero-Shot Recognition
    Li, Houjun
    Wei, Boquan
    Computer Engineering and Applications, 60 (09): : 219 - 227