Let the robot tell: Describe car image with natural language via LSTM

被引：13

作者：

Chen, Long ^{[1
]}

He, Yuhang ^{[1
]}

Fan, Lei ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou 510006, Guangdong, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2017年 / 98卷

关键词：

D O I：

10.1016/j.patrec.2017.09.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-based car detection and classification has remained as a research hub in self-driving for decades. However, natural language description of car images is still a virgin territory even though it is a simple task for human to describe it by sentences within a glimpse at the image. In this paper, we present an end-to-end trainable and spatial-temporal deep recurrent neural network: LSTM (Long-Short Term Memory) to automatically convert car images to human understandable natural language descriptions. Our model builds on state of the art progress in computer vision and machine translation: we extract car region proposals with Region Convolutional Neural Networks(R-CNN) and embed them into fixed-sized vectors. Each word in a sentence is also embedded into real-valued vectors of the same size of images through a local global context aware neural network. The LSTM, feeding by image-sentence pairs sequentially in the training stage, is trained to maximize the joint probability of target word in each time step. In the test stage, the pre-trained LSTM receives a car image and predicts natural language description word by word. Finally, we evaluate our model regarding car's static/dynamic attribute description on both 30,000 CompCar dataset [21] and 1000 video dataset collected on street scenario by our self-driving car, with quantitative BLEU score and subjective human-rating system evaluation metrics. We test our model's generalization ability, its transfer ability to address car property classification issue and various image feature extractors' impact on our model. Experiment results show the superiority and robustness of our model (refer to www. carlib. net/carimg2text. html formoreexperimentresults). (C) 2017 Elsevier B.V. Allrightsreserved.

引用

页码：75 / 82

页数：8

共 9 条

[1] Teaching Machines to Describe Images via Natural Language Feedback
Ling, Huan
Fidler, Sanja
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[2] Tell Your Robot What to Do: Evaluation of Natural Language Models for Robot Command Processing
Kramer, Erick Romero
Sainz, Argentina Ortega
Mitrevski, Alex
Ploeger, Paul G.
ROBOT WORLD CUP XXIII, ROBOCUP 2019, 2019, 11531 : 255 - 267
[3] Robot Program Construction via Grounded Natural Language Semantics & Simulation
Pomarlan, Mihai
Bateman, John
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 857 - 864
[4] Automatic Translation of Spanish Natural Language Commands to Control Robot Comands based on LSTM neural network
Suarez Bonilla, Felix David
Ruiz Ugalde, Federico
2019 THIRD IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2019), 2019, : 125 - 131
[5] Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback
Yuan, Yifei
Lam, Wai
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 839 - 848
[6] Natural language understanding based on mental image description language L-md and its application to language-centered robot manipulation
Yokota, Masao
Sugita, Kaoru
Oka, Tetsushi
ARTIFICIAL LIFE AND ROBOTICS, 2008, 13 (01) : 84 - 88
[7] The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic
Jia, Haosen
Yao, Hong
Tian, Tian
Yan, Cheng
Li, Shengwen
HUMAN CENTERED COMPUTING, 2019, 11956 : 175 - 189
[8] Tell me when and why to do it! Run-time Planner Model Updates Via Natural Language Instruction
Cantrell, Rehj
Talamadupula, Kartik
Schermerhorn, Paul
Benton, J.
Kambhampati, Subbarao
Scheutz, Matthias
HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 471 - 478
[9] TOWARDS MALICIOUS ACTION DETECTION FOR NUCLEAR SERUCITY VIA INTEGRATED DEEP LEARNING BASED IMAGE RECOGNITION AND NATURAL LANGUAGE PROCESSING
Demachi, Kazuyuki
Sudo, Masaki
Chen, Shi
PROCEEDINGS OF 2021 28TH INTERNATIONAL CONFERENCE ON NUCLEAR ENGINEERING (ICONE28), VOL 3, 2021,

← 1 →