Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions

被引：3

作者：

Sharma, Karan ^{[1
]}

Kumar, Arun C. S. ^{[1
]}

Bhandarkar, Suchendra M. ^{[1
]}

机构：

[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA

来源：

2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW) | 2017年

关键词：

action recognition; natural language processing; word2vec model; Object-Verb-Object triplet model;

D O I：

10.1109/WACVW.2017.17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detecting actions or verbs in still images is a challenging problem for a variety of reasons such as the absence of temporal information and polysemy of verbs which lead to difficulty in generating large verb datasets. In this paper, we propose to first detect the prominent objects in the image and then infer the relevant actions or verbs using Natural Language Processing (NLP)-based techniques. The proposed scheme obviates the need for training and using visual action detectors on images, an approach which tends to be error-prone and computationally intensive. This paper provides a valuable insight in that the detection of a few significant (i.e., top) objects in an image allows one to extract or infer the relevant actions or verbs in the image. To this end, we propose NLP-based approaches relying on the word2vec and the Object-Verb-Object triplet models for predicting the actions from top-object detections and also analyze their nuances. Our experimental results show that verbs can be reliably and efficiently detected by exploiting the top object detections in an image.

引用

页码：58 / 66

页数：9

共 50 条

[31] Loss Guided Activation for Action Recognition in Still Images
Liu, Lu
Tan, Robby T.
You, Shaodi
COMPUTER VISION - ACCV 2018, PT V, 2019, 11365 : 152 - 167
[32] Learning Hierarchical Context for Action Recognition in Still Images
Zhu, Haisheng
Hu, Jian-Fang
Zheng, Wei-Shi
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 67 - 77
[33] Transfer learning with fine tuning for human action recognition from still images
Chakraborty, Saikat
Mondal, Riktim
Singh, Pawan Kumar
Sarkar, Ram
Bhattacharjee, Debotosh
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 20547 - 20578
[34] Transfer learning with fine tuning for human action recognition from still images
Saikat Chakraborty
Riktim Mondal
Pawan Kumar Singh
Ram Sarkar
Debotosh Bhattacharjee
Multimedia Tools and Applications, 2021, 80 : 20547 - 20578
[35] Action Recognition from Still Images Based on Deep VLAD Spatial Pyramids
Yan, Shiyang
Smith, Jeremy S.
Zhang, Bailing
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2017, 54 : 118 - 129
[36] Action recognition in still images by learning spatial interest regions from videos
Eweiwi, Abdalrahman
Cheema, Muhammad Shahzad
Bauckhage, Christian
PATTERN RECOGNITION LETTERS, 2015, 51 : 8 - 15
[37] Word Recognition in Natural Scene and Video Images using Hidden Markov Model
Roy, Sangheeta
Roy, Partha Pratim
Shivakumara, Palaiahnakote
Pal, Umapada
2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,
[38] 3D lithological mapping of borehole descriptions using word embeddings
Fuentes, Ignacio
Padarian, Jose
Iwanaga, Takuya
Vervoort, R. Willem
COMPUTERS & GEOSCIENCES, 2020, 141
[39] LEARNING DISCRIMINATIVE ACTION AND CONTEXT REPRESENTATIONS FOR ACTION RECOGNITION IN STILL IMAGES
Xin, Miao
Zhang, Hong
Yuan, Ding
Sun, Mingui
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 757 - 762
[40] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
He, Tianxing
Xiang, Xu
Qian, Yanmin
Yu, Kai
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400

← 1 2 3 4 5 →