Let the robot tell: Describe car image with natural language via LSTM

被引:13
|
作者
Chen, Long [1 ]
He, Yuhang [1 ]
Fan, Lei [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou 510006, Guangdong, Peoples R China
关键词
D O I
10.1016/j.patrec.2017.09.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-based car detection and classification has remained as a research hub in self-driving for decades. However, natural language description of car images is still a virgin territory even though it is a simple task for human to describe it by sentences within a glimpse at the image. In this paper, we present an end-to-end trainable and spatial-temporal deep recurrent neural network: LSTM (Long-Short Term Memory) to automatically convert car images to human understandable natural language descriptions. Our model builds on state of the art progress in computer vision and machine translation: we extract car region proposals with Region Convolutional Neural Networks(R-CNN) and embed them into fixed-sized vectors. Each word in a sentence is also embedded into real-valued vectors of the same size of images through a local global context aware neural network. The LSTM, feeding by image-sentence pairs sequentially in the training stage, is trained to maximize the joint probability of target word in each time step. In the test stage, the pre-trained LSTM receives a car image and predicts natural language description word by word. Finally, we evaluate our model regarding car's static/dynamic attribute description on both 30,000 CompCar dataset [21] and 1000 video dataset collected on street scenario by our self-driving car, with quantitative BLEU score and subjective human-rating system evaluation metrics. We test our model's generalization ability, its transfer ability to address car property classification issue and various image feature extractors' impact on our model. Experiment results show the superiority and robustness of our model (refer to www. carlib. net/carimg2text. html formoreexperimentresults). (C) 2017 Elsevier B.V. Allrightsreserved.
引用
收藏
页码:75 / 82
页数:8
相关论文
共 9 条
  • [1] Teaching Machines to Describe Images via Natural Language Feedback
    Ling, Huan
    Fidler, Sanja
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] Tell Your Robot What to Do: Evaluation of Natural Language Models for Robot Command Processing
    Kramer, Erick Romero
    Sainz, Argentina Ortega
    Mitrevski, Alex
    Ploeger, Paul G.
    ROBOT WORLD CUP XXIII, ROBOCUP 2019, 2019, 11531 : 255 - 267
  • [3] Robot Program Construction via Grounded Natural Language Semantics & Simulation
    Pomarlan, Mihai
    Bateman, John
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 857 - 864
  • [4] Automatic Translation of Spanish Natural Language Commands to Control Robot Comands based on LSTM neural network
    Suarez Bonilla, Felix David
    Ruiz Ugalde, Federico
    2019 THIRD IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2019), 2019, : 125 - 131
  • [5] Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback
    Yuan, Yifei
    Lam, Wai
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 839 - 848
  • [6] Natural language understanding based on mental image description language L-md and its application to language-centered robot manipulation
    Yokota, Masao
    Sugita, Kaoru
    Oka, Tetsushi
    ARTIFICIAL LIFE AND ROBOTICS, 2008, 13 (01) : 84 - 88
  • [7] The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic
    Jia, Haosen
    Yao, Hong
    Tian, Tian
    Yan, Cheng
    Li, Shengwen
    HUMAN CENTERED COMPUTING, 2019, 11956 : 175 - 189
  • [8] Tell me when and why to do it! Run-time Planner Model Updates Via Natural Language Instruction
    Cantrell, Rehj
    Talamadupula, Kartik
    Schermerhorn, Paul
    Benton, J.
    Kambhampati, Subbarao
    Scheutz, Matthias
    HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 471 - 478
  • [9] TOWARDS MALICIOUS ACTION DETECTION FOR NUCLEAR SERUCITY VIA INTEGRATED DEEP LEARNING BASED IMAGE RECOGNITION AND NATURAL LANGUAGE PROCESSING
    Demachi, Kazuyuki
    Sudo, Masaki
    Chen, Shi
    PROCEEDINGS OF 2021 28TH INTERNATIONAL CONFERENCE ON NUCLEAR ENGINEERING (ICONE28), VOL 3, 2021,