Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

被引:0
|
作者
Marcella Cornia
Lorenzo Baraldi
Giuseppe Fiameni
Rita Cucchiara
机构
[1] University of Modena and Reggio Emilia,
[2] NVIDIA AI Technology Centre,undefined
[3] IIT-CNR,undefined
来源
关键词
Image captioning; Vision and language; Multimodal learning;
D O I
暂无
中图分类号
学科分类号
摘要
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.
引用
收藏
页码:1701 / 1720
页数:19
相关论文
共 50 条
  • [41] CAD assembly model retrieval based on multi-source semantics information and weighted bipartite graph
    Han, Zhoupeng
    Mo, Rong
    Yang, Haicheng
    Hao, Li
    COMPUTERS IN INDUSTRY, 2018, 96 : 54 - 65
  • [42] Evaluation and Error Analysis of Multi-Source Precipitation Datasets during Summer over the Tibetan Plateau
    Zhao, Keyue
    Zhong, Shanshan
    ATMOSPHERE, 2024, 15 (02)
  • [43] Regional Soil Moisture Estimation Leveraging Multi-Source Data Fusion and Automated Machine Learning
    Li, Shenglin
    Zhu, Pengyuan
    Song, Ni
    Li, Caixia
    Wang, Jinglei
    REMOTE SENSING, 2025, 17 (05)
  • [44] Investigation of Land Use/Cover Change in the City of Alfujairah Using Multi-temporal and Multi-source Geospatial Datasets
    Alzuodi, Moza
    Beaid, Mohamed Ait
    ElBattay, Ali
    2014 THIRD INTERNATIONAL WORKSHOP ON EARTH OBSERVATION AND REMOTE SENSING APPLICATIONS (EORSA 2014), 2014,
  • [45] Object-Based Multi-Temporal and Multi-Source Land Cover Mapping Leveraging Hierarchical Class Relationships
    Gbodjo, Yawogan Jean Eudes
    Ienco, Dino
    Leroux, Louise
    Interdonato, Roberto
    Gaetano, Raffaele
    Ndao, Babacar
    REMOTE SENSING, 2020, 12 (17) : 1 - 28
  • [46] Generating Persuasive Responses to Customer Reviews with Multi-Source Prior Knowledge in E-commerce
    Chen, Bo
    Liu, Jiayi
    Maimaiti, Mieradilijiang
    Gao, Xing
    Zhang, Ji
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2994 - 3002
  • [47] Moment Generating Function of the AoI in Multi-Source Systems with Computation-Intensive Status Updates
    Moltafet, Mohammad
    Leinonen, Markus
    Codreanu, Marian
    2021 IEEE INFORMATION THEORY WORKSHOP (ITW), 2021,
  • [48] The segmentation effect of style transfer on fetal head ultrasound image: a study of multi-source data
    Mengqiang Zhou
    Chuan Wang
    Yaosheng Lu
    Ruiyu Qiu
    Rongdan Zeng
    Dengjiang Zhi
    Xiaosong Jiang
    Zhanhong Ou
    Huijin Wang
    Gaowen Chen
    Jieyun Bai
    Medical & Biological Engineering & Computing, 2023, 61 : 1017 - 1031
  • [49] The segmentation effect of style transfer on fetal head ultrasound image: a study of multi-source data
    Zhou, Mengqiang
    Wang, Chuan
    Lu, Yaosheng
    Qiu, Ruiyu
    Zeng, Rongdan
    Zhi, Dengjiang
    Jiang, Xiaosong
    Ou, Zhanhong
    Wang, Huijin
    Chen, Gaowen
    Bai, Jieyun
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2023, 61 (05) : 1017 - 1031
  • [50] Dense wetland sample production at large scale by combining multi-source thematic datasets and visual interpretation
    Peng K.
    Jiang W.
    Hou P.
    Ling Z.
    Niu Z.
    Mao D.
    Huang Z.
    National Remote Sensing Bulletin, 2024, 28 (02) : 334 - 345