Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification

被引:0
|
作者
Wang, Bo [1 ]
Zhao, Kaili [1 ]
Zhao, Hongyang [1 ]
Pu, Shi
Xiao, Bo [1 ]
Guo, Jun [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
D O I
10.1109/ICCVW60793.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video attributes, which leverage video contents to instantiate class semantics, play a critical role in diversifying semantics in zero-shot video classification, thereby facilitating semantic transfer from seen to unseen classes. However, few presences discuss video attributes, and most methods consider class names as class semantics that tend to be loosely defined. In this paper, we propose a Video Attribute Prototype Network (VAPNet) to generate video attributes that learns in-context semantics between video captions and class semantics. Specifically, we introduce a cross-attention module in the Transformer decoder by considering video captions as queries to probe and pool semantic-associated class-wise features. To alleviate noises in pre-extracted captions, we learn caption features through a stochastic representation derived from a Gaussian representation where the variance encodes uncertainties. We utilize a joint video-to-attribute and video-to-video contrastive loss to calibrate visual and semantic features. Experiments show that VAPNet significantly outperforms SoTA by relative improvements of 14.3% on UCF101 and 8.8% on HMDB51, and further surpasses the pre-trained vision-language SoTA by 4.1% and 17.2%. Code is available.
引用
收藏
页码:315 / 324
页数:10
相关论文
共 50 条
  • [21] Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities
    Wang, Ping
    Sun, Li
    Wang, Liuan
    Sun, Jun
    SUSTAINABILITY, 2023, 15 (01)
  • [22] Zero-Shot Video Retrieval Using Content and Concepts
    Dalton, Jeffrey
    Allan, James
    Mirajkar, Pranav
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1857 - 1860
  • [23] Zero-Shot Learning via Attribute Regression and Class Prototype Rectification
    Luo, Changzhi
    Li, Zhetao
    Huang, Kaizhu
    Feng, Jiashi
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (02) : 637 - 648
  • [24] Latent Concept Extraction for Zero-shot Video Retrieval
    Ueki, Kazuya
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [25] Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
    Zhu, Yan
    Zhuo, Junbao
    Ma, Bin
    Geng, Jiajia
    Wei, Xiaoming
    Wei, Xiaolin
    Wang, Shuhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7491 - 7501
  • [26] Hierarchical Zero-Shot Classification with Convolutional Neural Network Features and Semantic Attribute Learning
    Markowitz, Jared
    Schmidt, Aurora C.
    Burlina, Philippe M.
    Wang, I-Jeng
    PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017, 2017, : 194 - 197
  • [27] Zero-Shot Image Classification Method Based on Attribute Weighting
    Chen, Wenbai
    Chen, Xiangfeng
    Liu, Chang
    Wu, Hao
    Li, Denghua
    PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 84 - 88
  • [28] Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
    Brattoli, Biagio
    Tighe, Joseph
    Zhdanov, Fedor
    Perona, Pietro
    Chalupka, Krzysztof
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4612 - 4622
  • [29] VDARN: Video Disentangling Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition
    Su, Yong
    Xing, Meng
    An, Simin
    Peng, Weilong
    Feng, Zhiyong
    AD HOC NETWORKS, 2021, 113
  • [30] Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition
    Su, Yong
    Zhu, Shuang
    Xing, Meng
    Xu, Hengpeng
    Li, Zhengtao
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 368 - 375