Learning Discriminative Representations for Big Data Clustering using Similarity-based Dimensionality Reduction

被引:0
|
作者
Passalis, Nikolaos [1 ]
Tefas, Anastasios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
关键词
EXTENSIONS; ALGORITHM; BAG;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Discriminative Clustering techniques simultaneously perform clustering and learn a representation that encourages the separability of the clusters. However, methods with high discriminative power tend to decrease clustering accuracy, since the cluster assignments are usually noisy. In this paper, a similarity-based dimensionality reduction method, that allows for learning regularized clustering-oriented representations and is able to efficiently scale to large datasets, is proposed. We avoid the pitfalls of highly discriminative methods, such as the Linear Discriminant Analysis (LDA), by maintaining a small similarity between the inter-cluster samples and a small dissimilarity between the intra-cluster samples instead of collapsing the intra-cluster samples and pushing the clusters as far apart as possible. Three datasets are used to demonstrate the ability of the proposed method to learn robust representations that improve the quality of the obtained clustering solutions over other clustering techniques.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Manifold Learning for Dimensionality Reduction and Clustering of Skin Spectroscopy Data
    Safi, Asad
    Castaneda, Victor
    Lasser, Tobias
    Mateus, Diana C.
    Navab, Nassir
    MEDICAL IMAGING 2011: COMPUTER-AIDED DIAGNOSIS, 2011, 7963
  • [22] Feature Dimensionality Reduction for Visualization and Clustering on Learning Process Data
    Supianto, Ahmad Afif
    Christyawan, Tomi Yahya
    Hafis, Muhammad
    Hayashi, Yusuke
    Hirashima, Tsukasa
    Hasanah, Nur
    PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET 2019), 2019, : 84 - 89
  • [23] Similarity-based clustering of sequences using hidden Markov models
    Bicego, M
    Murino, V
    Figueiredo, MAT
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2003, 2734 : 86 - 95
  • [24] FedGroup: Efficient Federated Learning via Decomposed Similarity-Based Clustering
    Duan, Moming
    Liu, Duo
    Ji, Xinyuan
    Liu, Renping
    Liang, Liang
    Chen, Xianzhang
    Tan, Yujuan
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 228 - 237
  • [25] Distributed dimensionality reduction of industrial data based on clustering
    Zhang, Yongyan
    Xie, Guo
    Wang, Wenqing
    Wang, Xiaofan
    Qian, Fucai
    Du, Xulong
    Du, Jinhua
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 370 - 374
  • [26] Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors
    Sun, Lin
    Zhang, Jiuxiao
    Ding, Weiping
    Xu, Jiucheng
    INFORMATION SCIENCES, 2022, 593 : 591 - 613
  • [27] iSoLIM: a similarity-based spatial prediction software for the big data era
    Zhao, Fang-He
    Zhu, A-Xing
    Zhu, Liang-Jun
    Qin, Cheng-Zhi
    ANNALS OF GIS, 2024,
  • [28] Efficient similarity-based data clustering by optimal object to cluster reallocation
    Rossignol, Mathias
    Lagrange, Mathieu
    Cont, Arshia
    PLOS ONE, 2018, 13 (06):
  • [29] A Similarity-Based Method for Entity Coreference Resolution in Big Data Environment
    Geng, Yushui
    Li, Peng
    Zhao, Jing
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING (AMITP 2016), 2016, 60 : 110 - 116
  • [30] Similarity matrix learning using dimensionality reduction for ontology applications
    Gao, Yun
    Li, Liang
    Wei, Gao
    Information Technology Journal, 2013, 12 (23) : 7442 - 7447