Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

被引:0
|
作者
Marcella Cornia
Lorenzo Baraldi
Giuseppe Fiameni
Rita Cucchiara
机构
[1] University of Modena and Reggio Emilia,
[2] NVIDIA AI Technology Centre,undefined
[3] IIT-CNR,undefined
来源
关键词
Image captioning; Vision and language; Multimodal learning;
D O I
暂无
中图分类号
学科分类号
摘要
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.
引用
收藏
页码:1701 / 1720
页数:19
相关论文
共 50 条
  • [31] Estimating Reservoir Release Using Multi-Source Satellite Datasets and Hydrological Modeling Techniques
    Shen, Youjiang
    Liu, Dedi
    Jiang, Liguang
    Tottrup, Christian
    Druce, Daniel
    Yin, Jiabo
    Nielsen, Karina
    Bauer-Gottwein, Peter
    Wang, Jun
    Zhao, Xin
    REMOTE SENSING, 2022, 14 (04)
  • [32] A semantics-based approach to multi-source heterogeneous information fusion in the internet of things
    Wang, Feng
    Hu, Liang
    Zhou, Jin
    Hu, Jiejun
    Zhao, Kuo
    SOFT COMPUTING, 2017, 21 (08) : 2005 - 2013
  • [33] A semantics-based approach to multi-source heterogeneous information fusion in the internet of things
    Feng Wang
    Liang Hu
    Jin Zhou
    Jiejun Hu
    Kuo Zhao
    Soft Computing, 2017, 21 : 2005 - 2013
  • [34] Multi-source information separation of a hydroelectric generating set based on EEMD-SOBI
    Zhi B.
    Qin J.
    Yang C.
    Yu Y.
    Zhendong yu Chongji/Journal of Vibration and Shock, 2023, 42 (04): : 229 - 235+294
  • [35] A semi-automatic approach for generating geological profiles by integrating multi-source data
    Wang, Bin
    Wu, Liang
    Li, Wenjia
    Qiu, Qinjun
    Xie, Zhong
    Liu, Hao
    Zhou, Yuan
    ORE GEOLOGY REVIEWS, 2021, 134
  • [36] Latent multi-feature co-regression for visual recognition by discriminatively leveraging multi-source models
    Tao, Jianwen
    Zhou, Di
    Liu, Fangyu
    Zhu, Bin
    PATTERN RECOGNITION, 2019, 87 : 296 - 316
  • [37] Multi-Source Training-Free Controllable Style Transfer via Diffusion Models
    Yu, Cuihong
    Han, Cheng
    Zhang, Chao
    SYMMETRY-BASEL, 2025, 17 (02):
  • [38] Predicting China's Maize Yield Using Multi-Source Datasets and Machine Learning Algorithms
    Miao, Lijuan
    Zou, Yangfeng
    Cui, Xuefeng
    Kattel, Giri Raj
    Shang, Yi
    Zhu, Jingwen
    REMOTE SENSING, 2024, 16 (13)
  • [39] Integration for degradation analysis with multi-source ADT datasets considering dataset discrepancies and epistemic uncertainties
    Chen, Wen-Bin
    Li, Xiao-Yang
    Kang, Rui
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 222
  • [40] Harmonizing Multi-Source Sonar Backscatter Datasets for Seabed Mapping Using Bulk Shift Approaches
    Misiuk, Benjamin
    Brown, Craig J.
    Robert, Katleen
    Lacharite, Myriam
    REMOTE SENSING, 2020, 12 (04)