A bag-of-words equivalent recurrent neural network for action recognition

被引:33
|
作者
Richard, Alexander [1 ]
Gall, Juergen [1 ]
机构
[1] Univ Bonn, Romerstrasse 164, D-53177 Bonn, Germany
基金
欧洲研究理事会;
关键词
Action recognition; Bag-of-words; Neural networks; DESCRIPTORS; CATEGORIES;
D O I
10.1016/j.cviu.2016.10.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The traditional bag-of-words approach has found a wide range of applications in computer vision. The standard pipeline consists of a generation of a visual vocabulary, a quantization of the features into histograms of visual words, and a classification step for which usually a support vector machine in combination with a non-linear kernel is used. Given large amounts of data, however, the model suffers from a lack of discriminative power. This applies particularly for action recognition, where the vast amount of video features needs to be subsampled for unsupervised visual vocabulary generation. Moreover, the kernel computation can be very expensive on large datasets. In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training. The model further allows to incorporate the kernel computation into the neural network directly, solving the complexity issue and allowing to represent the complete classification system within a single network. We evaluate our method on four recent action recognition benchmarks and show that the conventional model as well as sparse coding methods are outperformed. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:79 / 91
页数:13
相关论文
共 50 条
  • [1] Scale Coding Bag-of-Words for Action Recognition
    Khan, Fahad Shahbaz
    van de Weijer, Joost
    Bagdanov, Andrew D.
    Felsberg, Michael
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 1514 - 1519
  • [2] Recognition of Traffic Sign Based on Bag-of-Words and Artificial Neural Network
    Islam, Kh Tohidul
    Raj, Ram Gopal
    Mujtaba, Ghulam
    [J]. SYMMETRY-BASEL, 2017, 9 (08):
  • [3] Bag-of-words Modelling for Speech Recognition
    Ziolko, Bartosz
    Manandhar, Suresh
    Wilson, Richard C.
    [J]. INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATIONS, PROCEEDINGS, 2009, : 646 - +
  • [4] Bag-of-Words Based Deep Neural Network for Image Retrieval
    Bai, Yalong
    Yu, Wei
    Xiao, Tianjun
    Xu, Chang
    Yang, Kuiyuan
    Ma, Wei-Ying
    Zhao, Tiejun
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 229 - 232
  • [5] Vehicle Logo Recognition Based on Bag-of-Words
    Yu, Shuyuan
    Zheng, Shibao
    Yang, Hua
    Liang, Longfei
    [J]. 2013 10TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2013), 2013, : 353 - 358
  • [6] Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
    Sheikh, Imran
    Illina, Irina
    Fohr, Dominique
    Linares, Georges
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 675 - 679
  • [7] Bag-of-Words as Target for Neural Machine Translation
    Ma, Shuming
    Sun, Xu
    Wang, Yizhong
    Lin, Junyang
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 332 - 338
  • [8] Recurrent Neural Network Language Model With Incremental Updated Context Information Generated Using Bag-of-Words Representation
    Haidar, Md. Akmal
    Kurimo, Mikko
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3504 - 3508
  • [9] Bag-of-Words Input for Long History Representation in Neural Network-based Language Models for Speech Recognition
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2371 - 2375
  • [10] Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition
    Li, Yang
    Ye, Junyong
    Wang, Tongqing
    Huang, Shijian
    [J]. VISUAL COMPUTER, 2015, 31 (10): : 1383 - 1394