Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching

被引：0

作者：

Shi, Haitao ^{[1
]}

Liu, Meng ^{[2
]}

Mu, Xiaoxuan ^{[1
]}

Song, Xuemeng ^{[3
]}

Hu, Yupeng ^{[1
]}

Nie, Liqiang ^{[4
]}

机构：

[1] Shandong Univ, Sch Software, Jinan, Peoples R China

[2] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan, Peoples R China

[3] Shandong Univ, Sch Comp Sci andTechnol, Qingdao, Peoples R China

[4] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2024年 / 42卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Cross-model retrieval; image-text matching; noisy correspondence; simi- larity distribution modeling;

D O I：

10.1145/3662732

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unleashing the power of image-text matching in real-world applications is hampered by noisy correspondence. Manually curating high-quality datasets is expensive and time-consuming, and datasets generated using diffusion models are not adequately well-aligned. The most promising way is to collect image-text pairs from the Internet, but it will inevitably introduce noisy correspondence. To reduce the negative impact of noisy correspondence, we propose a novel model that first transforms the noisy correspondence filtering problem into a similarity distribution modeling problem by exploiting the powerful capabilities of pre-trained models. Specifically, we use the Gaussian Mixture model to model the similarity obtained by CLIP as clean distribution and noisy distribution, to filter out most of the noisy correspondence in the dataset. Afterward, we used relatively clean data to fine-tune the model. To further reduce the negative impact of unfiltered noisy correspondence, i.e., a minimal part where two distributions intersect during the fine-tuning process, we propose a distribution-sensitive dynamic margin ranking loss, further increasing the distance between the two distributions. Through continuous iteration, the noisy correspondence gradually decreases and the model performance gradually improves. Our extensive experiments demonstrate the effectiveness and robustness of our model even under high noise rates.

引用

页数：26

共 50 条

[1] More Grounded Image Captioning by Distilling Image-Text Matching Model
Zhou, Yuanen
Wang, Meng
Liu, Daqing
Hu, Zhenzhen
Zhang, Hanwang
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
[2] Location Attention Knowledge Embedding Model for Image-Text Matching
Xu, Guoqing
Hu, Min
Wang, Xiaohua
Yang, Jiaoyun
Li, Nan
Zhang, Qingyu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
[3] Image-Text Matching Model Based on CLIP Bimodal Encoding
Zhu, Yihuan
Xu, Honghua
Du, Ailin
Wang, Bin
APPLIED SCIENCES-BASEL, 2024, 14 (22):
[4] Similarity Reasoning and Filtration for Image-Text Matching
Diao, Haiwen
Zhang, Ying
Ma, Lin
Lu, Huchuan
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
[5] Asymmetric Polysemous Reasoning for Image-Text Matching
Zhang, Hongping
Yang, Ming
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
[6] Visual Semantic Reasoning for Image-Text Matching
Li, Kunpeng
Zhang, Yulun
Li, Kai
Li, Yuanyuan
Fu, Yun
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
[7] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
Miao Lanxin
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
[8] Fusion layer attention for image-text matching
Wang, Depeng
Wang, Liejun
Song, Shiji
Huang, Gao
Guo, Yuchen
Cheng, Shuli
Ao, Naixiang
Du, Anyu
NEUROCOMPUTING, 2021, 442 : 249 - 259
[9] Stacked Cross Attention for Image-Text Matching
Lee, Kuang-Huei
Chen, Xi
Hua, Gang
Hu, Houdong
He, Xiaodong
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
[10] Giving Text More Imagination Space for Image-text Matching
Dong, Xinfeng
Han, Longfei
Zhang, Dingwen
Liu, Li
Han, Junwei
Zhang, Huaxiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368

← 1 2 3 4 5 →