Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

被引：0

作者：

Marcella Cornia

Lorenzo Baraldi

Giuseppe Fiameni

Rita Cucchiara

机构：

[1] University of Modena and Reggio Emilia,

[2] NVIDIA AI Technology Centre,undefined

[3] IIT-CNR,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Image captioning; Vision and language; Multimodal learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.

引用

页码：1701 / 1720

页数：19

共 50 条

[31] Estimating Reservoir Release Using Multi-Source Satellite Datasets and Hydrological Modeling Techniques
Shen, Youjiang
Liu, Dedi
Jiang, Liguang
Tottrup, Christian
Druce, Daniel
Yin, Jiabo
Nielsen, Karina
Bauer-Gottwein, Peter
Wang, Jun
Zhao, Xin
REMOTE SENSING, 2022, 14 (04)
[32] A semantics-based approach to multi-source heterogeneous information fusion in the internet of things
Wang, Feng
Hu, Liang
Zhou, Jin
Hu, Jiejun
Zhao, Kuo
SOFT COMPUTING, 2017, 21 (08) : 2005 - 2013
[33] A semantics-based approach to multi-source heterogeneous information fusion in the internet of things
Feng Wang
Liang Hu
Jin Zhou
Jiejun Hu
Kuo Zhao
Soft Computing, 2017, 21 : 2005 - 2013
[34] Multi-source information separation of a hydroelectric generating set based on EEMD-SOBI
Zhi B.
Qin J.
Yang C.
Yu Y.
Zhendong yu Chongji/Journal of Vibration and Shock, 2023, 42 (04): : 229 - 235+294
[35] A semi-automatic approach for generating geological profiles by integrating multi-source data
Wang, Bin
Wu, Liang
Li, Wenjia
Qiu, Qinjun
Xie, Zhong
Liu, Hao
Zhou, Yuan
ORE GEOLOGY REVIEWS, 2021, 134
[36] Latent multi-feature co-regression for visual recognition by discriminatively leveraging multi-source models
Tao, Jianwen
Zhou, Di
Liu, Fangyu
Zhu, Bin
PATTERN RECOGNITION, 2019, 87 : 296 - 316
[37] Multi-Source Training-Free Controllable Style Transfer via Diffusion Models
Yu, Cuihong
Han, Cheng
Zhang, Chao
SYMMETRY-BASEL, 2025, 17 (02):
[38] Predicting China's Maize Yield Using Multi-Source Datasets and Machine Learning Algorithms
Miao, Lijuan
Zou, Yangfeng
Cui, Xuefeng
Kattel, Giri Raj
Shang, Yi
Zhu, Jingwen
REMOTE SENSING, 2024, 16 (13)
[39] Integration for degradation analysis with multi-source ADT datasets considering dataset discrepancies and epistemic uncertainties
Chen, Wen-Bin
Li, Xiao-Yang
Kang, Rui
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 222
[40] Harmonizing Multi-Source Sonar Backscatter Datasets for Seabed Mapping Using Bulk Shift Approaches
Misiuk, Benjamin
Brown, Craig J.
Robert, Katleen
Lacharite, Myriam
REMOTE SENSING, 2020, 12 (04)

← 1 2 3 4 5 →