Captioning Images with Diverse Objects

被引:83
|
作者
Venugopalan, Subhashini [1 ]
Mooney, Raymond [1 ]
Hendricks, Lisa Anne [2 ]
Darrell, Trevor [2 ]
Rohrbach, Marcus [2 ]
Saenko, Kate [3 ]
机构
[1] UT Austin, Austin, TX 78712 USA
[2] Univ Calif Berkeley, Berkeley, CA USA
[3] Boston Univ, Boston, MA 02215 USA
关键词
D O I
10.1109/CVPR.2017.130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets. We demonstrate that our model exploits semantic information to generate captions for hundreds of object categories in the Image Net object recognition data set that are not observed in MSCOCO image-caption training data, as well as many categories that are observed very rarely. Both automatic evaluations and human judgements show that our model considerably outperforms prior work in being able to describe many more categories of objects.
引用
收藏
页码:1170 / 1178
页数:9
相关论文
共 50 条
  • [21] Textual Context-Aware Dense Captioning With Diverse Words
    Shao, Zhuang
    Han, Jungong
    Debattista, Kurt
    Pang, Yanwei
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8753 - 8766
  • [22] Diverse Video Captioning by Adaptive Spatio-temporal Attention
    Ghaderi, Zohreh
    Salewski, Leonard
    Lensch, Hendrik P. A.
    [J]. PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 409 - 425
  • [23] Variational Stacked Local Attention Networks for Diverse Video Captioning
    Deb, Tonmoay
    Sadmanee, Akib
    Bhaumik, Kishor Kumar
    Ali, Amin Ahsan
    Amin, M. Ashraful
    Rahman, A. K. M. Mahbubur
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2493 - 2502
  • [24] Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
    Lu, Yifan
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Peng
    Wang, Yan
    Li, Bing
    Hu, Weiming
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3909 - 3917
  • [25] Captioning System with Function of Inserting Mathematical Formula Images
    Takeuchi, Yoshinori
    Sato, Yuji
    Horiike, Kazuki
    Wakatsuki, Daisuke
    Minagawa, Hiroki
    Ohnishi, Noboru
    [J]. COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2014, PT I, 2014, 8547 : 33 - 40
  • [26] Critical Theory: Diverse Objects, Diverse Subjects INTRODUCTION
    Lehmann, Jennifer M.
    [J]. CRITICAL THEORY: DIVERSE OBJECTS, DIVERSE SUBJECTS, 2003, 22 : XIII - XVII
  • [27] Scene captioning with deep fusion of images and point clouds
    Yu, Qiang
    Zhang, Chunxia
    Weng, Lubin
    Xiang, Shiming
    Pan, Chunhong
    [J]. PATTERN RECOGNITION LETTERS, 2022, 158 : 9 - 15
  • [28] Vision to Language: Captioning Images using Deep Learning
    Charu, Shreyasi
    Mishra, S. P.
    Gandhi, Tapan
    [J]. 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2020,
  • [29] Captioning Remote Sensing Images Using Transformer Architecture
    Nanal, Wrucha
    Hajiarbabi, Mohammadreza
    [J]. 2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 413 - 418
  • [30] Arabic Captioning for Images of Clothing Using Deep Learning
    Al-Malki, Rasha Saleh
    Al-Aama, Arwa Yousuf
    [J]. SENSORS, 2023, 23 (08)