Deep contrastive representation learning for multi-modal clustering

被引：2

作者：

Lu, Yang ^{[1
,2
]}

Li, Qin ^{[3
]}

Zhang, Xiangdong ^{[1
]}

Gao, Quanxue ^{[1
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China

[2] Res Inst Air Firce, Beijing, Peoples R China

[3] Shenzhen Inst Informat Technol, Sch Software Engn, Shenzhen 518172, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 581卷

基金：

中国国家自然科学基金;

关键词：

Multi-view representation learning; Self-supervision; Clustering;

D O I：

10.1016/j.neucom.2024.127523

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi -modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi -modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi -modal contrastive representation learning, and present a multi -modal learning network, named trustworthy multimodal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi -modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from 'easy' to 'complex' samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal -consensus representation, which takes into account not only the inter -modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.

引用

页数：8

共 50 条

[41] Editorial for Special Issue on Multi-modal Representation Learning
Fan, Deng-Ping
Barnes, Nick
Cheng, Ming-Ming
Van Gool, Luc
MACHINE INTELLIGENCE RESEARCH, 2024, 21 (04) : 615 - 616
[42] Multi-modal Graph Contrastive Learning for Micro-video Recommendation
Yi, Zixuan
Wang, Xi
Ounis, Iadh
Macdonald, Craig
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1807 - 1811
[43] Reliable multi-modal prototypical contrastive learning for difficult airway assessment
Li, Xiaofan
Peng, Bo
Yao, Yuan
Zhang, Guangchao
Xie, Zhuyang
Saleem, Muhammad Usman
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
[44] A Hard Negatives Mining and Enhancing Method for Multi-Modal Contrastive Learning
Li, Guangping
Gao, Yanan
Huang, Xianhui
Ling, Bingo Wing-Kuen
ELECTRONICS, 2025, 14 (04):
[45] ConOffense: Multi-modal multitask Contrastive learning for offensive content identification
Shome, Debaditya
Kar, T.
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4524 - 4529
[46] A deep contrastive multi-modal encoder for multi-omics data integration and analysis
Yinghua, Ma
Khan, Ahmad
Heng, Yang
Khan, Fiaz Gul
Ali, Farman
Al-Otaibi, Yasser D.
Bashir, Ali Kashif
INFORMATION SCIENCES, 2025, 700
[47] MultiCAD: Contrastive Representation Learning for Multi-modal 3D Computer-Aided Design Models
Ma, Weijian
Xu, Minyang
Li, Xueyang
Zhou, Xiangdong
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1766 - 1776
[48] Hierarchical sparse representation with deep dictionary for multi-modal classification
Wang, Zhengxia
Teng, Shenghua
Liu, Guodong
Zhao, Zengshun
Wu, Hongli
NEUROCOMPUTING, 2017, 253 : 65 - 69
[49] Multi-scale and multi-modal contrastive learning network for biomedical time series
Guo, Hongbo
Xu, Xinzi
Wu, Hao
Liu, Bin
Xia, Jiahui
Cheng, Yibang
Guo, Qianhui
Chen, Yi
Xu, Tingyan
Wang, Jiguang
Wang, Guoxing
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 106
[50] Multi-Modal Song Mood Detection with Deep Learning
Pyrovolakis, Konstantinos
Tzouveli, Paraskevi
Stamou, Giorgos
SENSORS, 2022, 22 (03)

← 1 2 3 4 5 →