Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

被引：0

作者：

Marcella Cornia

Lorenzo Baraldi

Giuseppe Fiameni

Rita Cucchiara

机构：

[1] University of Modena and Reggio Emilia,

[2] NVIDIA AI Technology Centre,undefined

[3] IIT-CNR,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Image captioning; Vision and language; Multimodal learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions. Large-scale datasets with noisy image-text pairs, indeed, provide a sub-optimal source of supervision because of their low-quality descriptive style, while human-annotated datasets are cleaner but smaller in scale. To get the best of both worlds, we propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component. The proposed model avoids the need of object detectors, is trained with a single objective of prompt language modeling, and can replicate the style of human-collected captions while training on sources with different input styles. Experimentally, the model shows a strong capability of recognizing real-world concepts and producing high-quality captions. Extensive experiments are performed on different image captioning datasets, including CC3M, nocaps, and the competitive COCO dataset, where our model consistently outperforms baselines and state-of-the-art approaches.

引用

页码：1701 / 1720

页数：19

共 50 条

[21] A global land cover map produced through integrating multi-source datasets
Feng, Min
Bai, Yan
BIG EARTH DATA, 2019, 3 (03) : 191 - 219
[22] A novel approach to information fusion in multi-source datasets: A granular computing viewpoint
Xu, Weihua
Yu, Jianhang
INFORMATION SCIENCES, 2017, 378 : 410 - 423
[23] Verifiable Privacy-Preserving Queries on Multi-Source Dynamic DNA Datasets
Lu, Dandan
Li, Ming
Liao, Yi
Tao, Guihua
Cai, Hongmin
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (02) : 1927 - 1939
[24] Multi-Source Causal Analysis: Learning Bayesian Networks from Multiple Datasets
Tsamardinos, Ioannis
Mariglis, Asimakis P.
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS III, 2009, : 479 - +
[25] Generalized photogrammetry of spaceborne, airborne and terrestrial multi-source remote sensing datasets
天空地多源遥感数据的广义摄影测量学
1600, SinoMaps Press (50): : 1 - 11
[26] Utilizing Multi-Source Datasets for the Reconstruction and Prediction of Water Temperature in Lake Miedwie (Poland)
Ptak, Mariusz
Zhu, Senlin
Amnuaylojaroen, Teerachai
Li, Huan
Szyga-Pluta, Katarzyna
Jiang, Sun
Wang, Li
Sojka, Mariusz
REMOTE SENSING, 2024, 16 (15)
[27] An Integration Approach for Mapping Field Capacity of China Based on Multi-Source Soil Datasets
Wu, Xiaotao
Lu, Guihua
Wu, Zhiyong
He, Hai
Zhou, Jianhong
Liu, Zhenchen
WATER, 2018, 10 (06)
[28] Visual exploration of mobility dynamics based on multi-source mobility datasets and POI information
Xiaoying Shi
Fanshun Lv
Dewen Seng
Baixi Xing
Bin Chen
Journal of Visualization, 2019, 22 : 1209 - 1223
[29] Multi-Source Transfer Learning Based on Inductive Knowledge-Leveraged for Medical Datasets
Zhang, Jingxiang
Wu, Weijie
Shao, Yanqing
JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (07) : 1615 - 1620
[30] Visual exploration of mobility dynamics based on multi-source mobility datasets and POI information
Shi, Xiaoying
Lv, Fanshun
Seng, Dewen
Xing, Baixi
Chen, Bin
JOURNAL OF VISUALIZATION, 2019, 22 (06) : 1209 - 1223

← 1 2 3 4 5 →