Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions

被引:3
|
作者
Sharma, Karan [1 ]
Kumar, Arun C. S. [1 ]
Bhandarkar, Suchendra M. [1 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
关键词
action recognition; natural language processing; word2vec model; Object-Verb-Object triplet model;
D O I
10.1109/WACVW.2017.17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting actions or verbs in still images is a challenging problem for a variety of reasons such as the absence of temporal information and polysemy of verbs which lead to difficulty in generating large verb datasets. In this paper, we propose to first detect the prominent objects in the image and then infer the relevant actions or verbs using Natural Language Processing (NLP)-based techniques. The proposed scheme obviates the need for training and using visual action detectors on images, an approach which tends to be error-prone and computationally intensive. This paper provides a valuable insight in that the detection of a few significant (i.e., top) objects in an image allows one to extract or infer the relevant actions or verbs in the image. To this end, we propose NLP-based approaches relying on the word2vec and the Object-Verb-Object triplet models for predicting the actions from top-object detections and also analyze their nuances. Our experimental results show that verbs can be reliably and efficiently detected by exploiting the top object detections in an image.
引用
收藏
页码:58 / 66
页数:9
相关论文
共 50 条
  • [1] Dissecting word embeddings and language models in natural language processing
    Verma, Vivek Kumar
    Pandey, Mrigank
    Jain, Tarun
    Tiwari, Pradeep Kumar
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2021, 24 (05): : 1509 - 1515
  • [2] Word Embeddings for Latvian Natural Language Processing Tools
    Znotins, Arturs
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 167 - 173
  • [3] Framer: Planning Models from Natural Language Action Descriptions
    Lindsay, Alan
    Read, Jonathon
    Ferreira, Joao F.
    Hayton, Thomas
    Porteous, Julie
    Gregory, Peter
    TWENTY-SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING, 2017, : 434 - 442
  • [4] Word embeddings for biomedical natural language processing: A survey
    Chiu, Billy
    Baker, Simon
    LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (12):
  • [5] A comparison of word embeddings for the biomedical natural language processing
    Wang, Yanshan
    Liu, Sijia
    Afzal, Naveed
    Rastegar-Mojarad, Majid
    Wang, Liwei
    Shen, Feichen
    Kingsbury, Paul
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 87 : 12 - 20
  • [6] Coloring Action Recognition in Still Images
    Khan, Fahad Shahbaz
    Anwer, Rao Muhammad
    van de Weijer, Joost
    Bagdanov, Andrew D.
    Lopez, Antonio M.
    Felsberg, Michael
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 105 (03) : 205 - 221
  • [7] Coloring Action Recognition in Still Images
    Fahad Shahbaz Khan
    Rao Muhammad Anwer
    Joost van de Weijer
    Andrew D. Bagdanov
    Antonio M. Lopez
    Michael Felsberg
    International Journal of Computer Vision, 2013, 105 : 205 - 221
  • [8] Understanding action recognition in still images
    Girish, Deeptha
    Singh, Vineeta
    Ralescu, Anca
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 1523 - 1529
  • [9] Human Action Recognition in Still Images
    Palak
    Chaudhary, Sachin
    Communications in Computer and Information Science, 2022, 1568 CCIS : 483 - 493
  • [10] Measuring Language Complexity Using Word Embeddings
    Whigham, Peter A.
    Chugh, Mansi
    Dick, Grant
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 843 - 854