Transductive Zero-Shot Action Recognition by Word-Vector Embedding

被引:99
|
作者
Xu, Xun [1 ]
Hospedales, Timothy [1 ]
Gong, Shaogang [1 ]
机构
[1] Queen Mary Univ London, London, England
关键词
Zero-shot action recognition; Zero-shot learning; Semantic embedding; Semi-supervised learning; Transfer learning; Action recognition;
D O I
10.1007/s11263-016-0983-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning conventional models for all categories. Instead of collecting ever more data and labelling them exhaustively for all categories, an attractive alternative approach is "zero-shot learning" (ZSL). To that end, in this study we construct a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data. Existing ZSL studies focus primarily on still images, and attribute-based semantic representations. In this work, we explore word-vectors as the shared semantic space to embed videos and category labels for ZSL action recognition. This is a more challenging problem than existing ZSL of still images and/or attributes, because the mapping between video space-time features of actions and the semantic space is more complex and harder to learn for the purpose of generalising over any cross-category domain shift. To solve this generalisation problem in ZSL action recognition, we investigate a series of synergistic strategies to improve upon the standard ZSL pipeline. Most of these strategies are transductive in nature which means access to testing data in the training phase. First, we enhance significantly the semantic space mapping by proposing manifold-regularized regression and data augmentation strategies. Second, we evaluate two existing post processing strategies (transductive self-training and hubness correction), and show that they are complementary. We evaluate extensively our model on a wide range of human action datasets including HMDB51, UCF101, Olympic Sports and event datasets including CCV and TRECVID MED 13. The results demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance with a simple and efficient pipeline, and without supervised annotation of attributes. Finally, we present in-depth analysis into why and when zero-shot works, including demonstrating the ability to predict cross-category transferability in advance.
引用
收藏
页码:309 / 333
页数:25
相关论文
共 50 条
  • [31] Deconfounding Causal Inference for Zero-Shot Action Recognition
    Wang, Junyan
    Jiang, Yiqi
    Long, Yang
    Sun, Xiuyu
    Pagnucco, Maurice
    Song, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3976 - 3986
  • [32] Global Semantic Descriptors for Zero-Shot Action Recognition
    Estevam, Valter
    Laroca, Rayson
    Pedrini, Helio
    Menotti, David
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1843 - 1847
  • [33] EXPLORING SYNONYMS AS CONTEXT IN ZERO-SHOT ACTION RECOGNITION
    Alexiou, Ioannis
    Xiang, Tao
    Gong, Shaogang
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4190 - 4194
  • [34] A Generative Approach to Zero-Shot and Few-Shot Action Recognition
    Mishra, Ashish
    Verma, Vinay Kumar
    Reddy, M. Shiva Krishna
    Arulkumar, S.
    Rai, Piyush
    Mittal, Anurag
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 372 - 380
  • [35] Transductive Zero-Shot Learning by Decoupled Feature Generation
    Marmoreo, Federico
    Cavazza, Jacopo
    Murino, Vittorio
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3108 - 3117
  • [36] Transductive Zero-Shot Hashing for Multilabel Image Retrieval
    Zou, Qin
    Cao, Ling
    Zhang, Zheng
    Chen, Long
    Wang, Song
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1673 - 1687
  • [37] Transductive Zero-Shot Learning with Visual Structure Constraint
    Wan, Ziyu
    Chen, Dongdong
    Li, Yan
    Yan, Xingguang
    Zhang, Junge
    Yu, Yizhou
    Liao, Jing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [38] Transductive Multi-View Zero-Shot Learning
    Fu, Yanwei
    Hospedales, Timothy M.
    Xiang, Tao
    Gong, Shaogang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (11) : 2332 - 2345
  • [39] Zero-Shot Leaning with Manifold Embedding
    Yu, Yun-long
    Ji, Zhong
    Pang, Yan-wei
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 135 - 147
  • [40] Generalized Zero-Shot Activity Recognition with Embedding-Based Method
    Wang, Wei
    Li, Qingzhong
    ACM TRANSACTIONS ON SENSOR NETWORKS, 2023, 19 (03)