Transductive Zero-Shot Action Recognition by Word-Vector Embedding

被引：99

作者：

Xu, Xun ^{[1
]}

Hospedales, Timothy ^{[1
]}

Gong, Shaogang ^{[1
]}

机构：

[1] Queen Mary Univ London, London, England

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2017年 / 123卷 / 03期

关键词：

Zero-shot action recognition; Zero-shot learning; Semantic embedding; Semi-supervised learning; Transfer learning; Action recognition;

D O I：

10.1007/s11263-016-0983-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning conventional models for all categories. Instead of collecting ever more data and labelling them exhaustively for all categories, an attractive alternative approach is "zero-shot learning" (ZSL). To that end, in this study we construct a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data. Existing ZSL studies focus primarily on still images, and attribute-based semantic representations. In this work, we explore word-vectors as the shared semantic space to embed videos and category labels for ZSL action recognition. This is a more challenging problem than existing ZSL of still images and/or attributes, because the mapping between video space-time features of actions and the semantic space is more complex and harder to learn for the purpose of generalising over any cross-category domain shift. To solve this generalisation problem in ZSL action recognition, we investigate a series of synergistic strategies to improve upon the standard ZSL pipeline. Most of these strategies are transductive in nature which means access to testing data in the training phase. First, we enhance significantly the semantic space mapping by proposing manifold-regularized regression and data augmentation strategies. Second, we evaluate two existing post processing strategies (transductive self-training and hubness correction), and show that they are complementary. We evaluate extensively our model on a wide range of human action datasets including HMDB51, UCF101, Olympic Sports and event datasets including CCV and TRECVID MED 13. The results demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance with a simple and efficient pipeline, and without supervised annotation of attributes. Finally, we present in-depth analysis into why and when zero-shot works, including demonstrating the ability to predict cross-category transferability in advance.

引用

页码：309 / 333

页数：25

共 50 条

[21] Bidirectional generative transductive zero-shot learning
Li, Xinpeng
Zhang, Dan
Ye, Mao
Li, Xue
Dou, Qiang
Lv, Qiao
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5313 - 5326
[22] Holistically Associated Transductive Zero-Shot Learning
Xu, Yangyang
Xu, Xuemiao
Han, Guoqiang
He, Shengfeng
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 437 - 447
[23] Generalized Zero-Shot Recognition based on Visually Semantic Embedding
Zhu, Pengkai
Wang, Hanxiao
Saligrama, Venkatesh
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2990 - 2998
[24] Zero-Shot Visual Recognition via Bidirectional Latent Embedding
Qian Wang
Ke Chen
International Journal of Computer Vision, 2017, 124 : 356 - 383
[25] Learning discriminative visual semantic embedding for zero-shot recognition
Xie, Yurui
Song, Tiecheng
Yuan, Jianying
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
[26] Domain-Specific Embedding Network for Zero-Shot Recognition
Min, Shaobo
Yao, Hantao
Xie, Hongtao
Zha, Zheng-Jun
Zhang, Yongdong
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2070 - 2078
[27] Zero-Shot Emotion Recognition via Affective Structural Embedding
Zhan, Chi
She, Dongyu
Zhao, Sicheng
Cheng, Ming-Ming
Yang, Jufeng
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1151 - 1160
[28] Zero-Shot Visual Recognition via Bidirectional Latent Embedding
Wang, Qian
Chen, Ke
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 124 (03) : 356 - 383
[29] Hierarchical-Dynamic Embedding for Zero-shot Object Recognition
Han, Xuebo
Li, Kan
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 520 - 525
[30] A ZERO-SHOT ARCHITECTURE FOR ACTION RECOGNITION IN STILL IMAGES
Safaei, Marjaneh
Foroosh, Hassan
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 460 - 464

← 1 2 3 4 5 →