Transductive Zero-Shot Action Recognition by Word-Vector Embedding

被引:99
|
作者
Xu, Xun [1 ]
Hospedales, Timothy [1 ]
Gong, Shaogang [1 ]
机构
[1] Queen Mary Univ London, London, England
关键词
Zero-shot action recognition; Zero-shot learning; Semantic embedding; Semi-supervised learning; Transfer learning; Action recognition;
D O I
10.1007/s11263-016-0983-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning conventional models for all categories. Instead of collecting ever more data and labelling them exhaustively for all categories, an attractive alternative approach is "zero-shot learning" (ZSL). To that end, in this study we construct a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data. Existing ZSL studies focus primarily on still images, and attribute-based semantic representations. In this work, we explore word-vectors as the shared semantic space to embed videos and category labels for ZSL action recognition. This is a more challenging problem than existing ZSL of still images and/or attributes, because the mapping between video space-time features of actions and the semantic space is more complex and harder to learn for the purpose of generalising over any cross-category domain shift. To solve this generalisation problem in ZSL action recognition, we investigate a series of synergistic strategies to improve upon the standard ZSL pipeline. Most of these strategies are transductive in nature which means access to testing data in the training phase. First, we enhance significantly the semantic space mapping by proposing manifold-regularized regression and data augmentation strategies. Second, we evaluate two existing post processing strategies (transductive self-training and hubness correction), and show that they are complementary. We evaluate extensively our model on a wide range of human action datasets including HMDB51, UCF101, Olympic Sports and event datasets including CCV and TRECVID MED 13. The results demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance with a simple and efficient pipeline, and without supervised annotation of attributes. Finally, we present in-depth analysis into why and when zero-shot works, including demonstrating the ability to predict cross-category transferability in advance.
引用
收藏
页码:309 / 333
页数:25
相关论文
共 50 条
  • [1] Transductive Zero-Shot Action Recognition by Word-Vector Embedding
    Xun Xu
    Timothy Hospedales
    Shaogang Gong
    International Journal of Computer Vision, 2017, 123 : 309 - 333
  • [2] Coupling Adversarial Graph Embedding for transductive zero-shot action recognition
    Tian, Yi
    Huang, Yaping
    Xu, Wanru
    Kong, Yu
    NEUROCOMPUTING, 2021, 452 : 239 - 252
  • [3] Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation
    Fu, Yanwei
    Hospedales, Timothy M.
    Xiang, Tao
    Fu, Zhenyong
    Gong, Shaogang
    COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 : 584 - 599
  • [4] Transductive Unbiased Embedding for Zero-Shot Learning
    Song, Jie
    Shen, Chengchao
    Yang, Yezhou
    Liu, Yang
    Song, Mingli
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1024 - 1033
  • [5] Transductive Learning With Prior Knowledge for Generalized Zero-Shot Action Recognition
    Su, Taiyi
    Wang, Hanli
    Qi, Qiuping
    Wang, Lei
    He, Bin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 260 - 273
  • [6] SEMANTIC EMBEDDING SPACE FOR ZERO-SHOT ACTION RECOGNITION
    Xu, Xun
    Hospedales, Timothy
    Gong, Shaogang
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 63 - 67
  • [7] Transductive Zero-Shot Learning With Adaptive Structural Embedding
    Yu, Yunlong
    Ji, Zhong
    Guo, Jichang
    Pang, Yanwei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) : 4116 - 4127
  • [8] Transductive Visual-Semantic Embedding for Zero-shot Learning
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shao, Jie
    Huang, Zi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 41 - 49
  • [9] Manifold embedding for zero-shot recognition
    Ji, Zhong
    Yu, Xuejie
    Yu, Yunlong
    He, Yuqing
    COGNITIVE SYSTEMS RESEARCH, 2019, 55 : 34 - 43
  • [10] Transductive Zero-Shot Action Recognition via Visually Connected Graph Convolutional Networks
    Xu, Yangyang
    Han, Chu
    Qin, Jing
    Xu, Xuemiao
    Han, Guoqiang
    He, Shengfeng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3761 - 3769