Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions

被引:3
|
作者
Sharma, Karan [1 ]
Kumar, Arun C. S. [1 ]
Bhandarkar, Suchendra M. [1 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
关键词
action recognition; natural language processing; word2vec model; Object-Verb-Object triplet model;
D O I
10.1109/WACVW.2017.17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting actions or verbs in still images is a challenging problem for a variety of reasons such as the absence of temporal information and polysemy of verbs which lead to difficulty in generating large verb datasets. In this paper, we propose to first detect the prominent objects in the image and then infer the relevant actions or verbs using Natural Language Processing (NLP)-based techniques. The proposed scheme obviates the need for training and using visual action detectors on images, an approach which tends to be error-prone and computationally intensive. This paper provides a valuable insight in that the detection of a few significant (i.e., top) objects in an image allows one to extract or infer the relevant actions or verbs in the image. To this end, we propose NLP-based approaches relying on the word2vec and the Object-Verb-Object triplet models for predicting the actions from top-object detections and also analyze their nuances. Our experimental results show that verbs can be reliably and efficiently detected by exploiting the top object detections in an image.
引用
收藏
页码:58 / 66
页数:9
相关论文
共 50 条
  • [31] Loss Guided Activation for Action Recognition in Still Images
    Liu, Lu
    Tan, Robby T.
    You, Shaodi
    COMPUTER VISION - ACCV 2018, PT V, 2019, 11365 : 152 - 167
  • [32] Learning Hierarchical Context for Action Recognition in Still Images
    Zhu, Haisheng
    Hu, Jian-Fang
    Zheng, Wei-Shi
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 67 - 77
  • [33] Transfer learning with fine tuning for human action recognition from still images
    Chakraborty, Saikat
    Mondal, Riktim
    Singh, Pawan Kumar
    Sarkar, Ram
    Bhattacharjee, Debotosh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 20547 - 20578
  • [34] Transfer learning with fine tuning for human action recognition from still images
    Saikat Chakraborty
    Riktim Mondal
    Pawan Kumar Singh
    Ram Sarkar
    Debotosh Bhattacharjee
    Multimedia Tools and Applications, 2021, 80 : 20547 - 20578
  • [35] Action Recognition from Still Images Based on Deep VLAD Spatial Pyramids
    Yan, Shiyang
    Smith, Jeremy S.
    Zhang, Bailing
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2017, 54 : 118 - 129
  • [36] Action recognition in still images by learning spatial interest regions from videos
    Eweiwi, Abdalrahman
    Cheema, Muhammad Shahzad
    Bauckhage, Christian
    PATTERN RECOGNITION LETTERS, 2015, 51 : 8 - 15
  • [37] Word Recognition in Natural Scene and Video Images using Hidden Markov Model
    Roy, Sangheeta
    Roy, Partha Pratim
    Shivakumara, Palaiahnakote
    Pal, Umapada
    2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,
  • [38] 3D lithological mapping of borehole descriptions using word embeddings
    Fuentes, Ignacio
    Padarian, Jose
    Iwanaga, Takuya
    Vervoort, R. Willem
    COMPUTERS & GEOSCIENCES, 2020, 141
  • [39] LEARNING DISCRIMINATIVE ACTION AND CONTEXT REPRESENTATIONS FOR ACTION RECOGNITION IN STILL IMAGES
    Xin, Miao
    Zhang, Hong
    Yuan, Ding
    Sun, Mingui
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 757 - 762
  • [40] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
    He, Tianxing
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400