Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

被引:0
|
作者
Marcella Cornia
Lorenzo Baraldi
Giuseppe Fiameni
Rita Cucchiara
机构
[1] University of Modena and Reggio Emilia,
[2] NVIDIA AI Technology Centre,undefined
[3] IIT-CNR,undefined
来源
关键词
Image captioning; Vision and language; Multimodal learning;
D O I
暂无
中图分类号
学科分类号
摘要
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.
引用
收藏
页码:1701 / 1720
页数:19
相关论文
共 50 条
  • [21] A global land cover map produced through integrating multi-source datasets
    Feng, Min
    Bai, Yan
    BIG EARTH DATA, 2019, 3 (03) : 191 - 219
  • [22] A novel approach to information fusion in multi-source datasets: A granular computing viewpoint
    Xu, Weihua
    Yu, Jianhang
    INFORMATION SCIENCES, 2017, 378 : 410 - 423
  • [23] Verifiable Privacy-Preserving Queries on Multi-Source Dynamic DNA Datasets
    Lu, Dandan
    Li, Ming
    Liao, Yi
    Tao, Guihua
    Cai, Hongmin
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (02) : 1927 - 1939
  • [24] Multi-Source Causal Analysis: Learning Bayesian Networks from Multiple Datasets
    Tsamardinos, Ioannis
    Mariglis, Asimakis P.
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS III, 2009, : 479 - +
  • [25] Generalized photogrammetry of spaceborne, airborne and terrestrial multi-source remote sensing datasets
    天空地多源遥感数据的广义摄影测量学
    1600, SinoMaps Press (50): : 1 - 11
  • [26] Utilizing Multi-Source Datasets for the Reconstruction and Prediction of Water Temperature in Lake Miedwie (Poland)
    Ptak, Mariusz
    Zhu, Senlin
    Amnuaylojaroen, Teerachai
    Li, Huan
    Szyga-Pluta, Katarzyna
    Jiang, Sun
    Wang, Li
    Sojka, Mariusz
    REMOTE SENSING, 2024, 16 (15)
  • [27] An Integration Approach for Mapping Field Capacity of China Based on Multi-Source Soil Datasets
    Wu, Xiaotao
    Lu, Guihua
    Wu, Zhiyong
    He, Hai
    Zhou, Jianhong
    Liu, Zhenchen
    WATER, 2018, 10 (06)
  • [28] Visual exploration of mobility dynamics based on multi-source mobility datasets and POI information
    Xiaoying Shi
    Fanshun Lv
    Dewen Seng
    Baixi Xing
    Bin Chen
    Journal of Visualization, 2019, 22 : 1209 - 1223
  • [29] Multi-Source Transfer Learning Based on Inductive Knowledge-Leveraged for Medical Datasets
    Zhang, Jingxiang
    Wu, Weijie
    Shao, Yanqing
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (07) : 1615 - 1620
  • [30] Visual exploration of mobility dynamics based on multi-source mobility datasets and POI information
    Shi, Xiaoying
    Lv, Fanshun
    Seng, Dewen
    Xing, Baixi
    Chen, Bin
    JOURNAL OF VISUALIZATION, 2019, 22 (06) : 1209 - 1223