Captioning Images with Diverse Objects

被引:83
|
作者
Venugopalan, Subhashini [1 ]
Mooney, Raymond [1 ]
Hendricks, Lisa Anne [2 ]
Darrell, Trevor [2 ]
Rohrbach, Marcus [2 ]
Saenko, Kate [3 ]
机构
[1] UT Austin, Austin, TX 78712 USA
[2] Univ Calif Berkeley, Berkeley, CA USA
[3] Boston Univ, Boston, MA 02215 USA
关键词
D O I
10.1109/CVPR.2017.130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets. We demonstrate that our model exploits semantic information to generate captions for hundreds of object categories in the Image Net object recognition data set that are not observed in MSCOCO image-caption training data, as well as many categories that are observed very rarely. Both automatic evaluations and human judgements show that our model considerably outperforms prior work in being able to describe many more categories of objects.
引用
收藏
页码:1170 / 1178
页数:9
相关论文
共 50 条
  • [1] Image Captioning: Transforming Objects into Words
    Herdade, Simao
    Kappeler, Armin
    Boakye, Kofi
    Soares, Joao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Pointing Novel Objects in Image Captioning
    Li, Yehao
    Yao, Ting
    Pan, Yingwei
    Chao, Hongyang
    Mei, Tao
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12489 - 12498
  • [3] Captioning Ultrasound Images Automatically
    Alsharid, Mohammad
    Sharma, Harshita
    Drukker, Lior
    Chatelain, Pierre
    Papageorghiou, Aris T.
    Noble, J. Alison
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 338 - 346
  • [4] Captioning the Images: A Deep Analysis
    Chaudhari, Chaitrali P.
    Devane, Satish
    [J]. COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 987 - 999
  • [5] Towards Diverse Paragraph Captioning for Untrimmed Videos
    Song, Yuqing
    Chen, Shizhe
    Jin, Qin
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11240 - 11249
  • [6] DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING
    Mei, Xinhao
    Liu, Xubo
    Sun, Jianyuan
    Plumbley, Mark D.
    Wang, Wenwu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8882 - 8886
  • [7] RefCap: image captioning with referent objects attributes
    Park, Seokmok
    Paik, Joonki
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [8] RefCap: image captioning with referent objects attributes
    Seokmok Park
    Joonki Paik
    [J]. Scientific Reports, 13
  • [9] Retrieved Generative Captioning for Medical Images
    Beddiar, Djamila Romaissa
    Oussalah, Mourad
    Seppanen, Tapio
    [J]. 20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 48 - 54
  • [10] Weakly Supervised Captioning of Ultrasound Images
    Alsharid, Mohammad
    Sharma, Harshita
    Drukker, Lior
    Papageorgiou, Aris T.
    Noble, J. Alison
    [J]. MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, MIUA 2022, 2022, 13413 : 187 - 198