Learning Discriminative Representations for Big Data Clustering using Similarity-based Dimensionality Reduction

被引:0
|
作者
Passalis, Nikolaos [1 ]
Tefas, Anastasios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
关键词
EXTENSIONS; ALGORITHM; BAG;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Discriminative Clustering techniques simultaneously perform clustering and learn a representation that encourages the separability of the clusters. However, methods with high discriminative power tend to decrease clustering accuracy, since the cluster assignments are usually noisy. In this paper, a similarity-based dimensionality reduction method, that allows for learning regularized clustering-oriented representations and is able to efficiently scale to large datasets, is proposed. We avoid the pitfalls of highly discriminative methods, such as the Linear Discriminant Analysis (LDA), by maintaining a small similarity between the inter-cluster samples and a small dissimilarity between the intra-cluster samples instead of collapsing the intra-cluster samples and pushing the clusters as far apart as possible. Three datasets are used to demonstrate the ability of the proposed method to learn robust representations that improve the quality of the obtained clustering solutions over other clustering techniques.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Interactive Data Visualization Using Dimensionality Reduction and Similarity-Based Representations
    Rosero-Montalvo, P.
    Diaz, P.
    Salazar-Castro, J. A.
    Pena-Unigarro, D. F.
    Anaya-Isaza, A. J.
    Alvarado-Perez, J. C.
    Theron, R.
    Peluffo-Ordonez, D. H.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 334 - 342
  • [2] Similarity-Based Three-Way Clustering by Using Dimensionality Reduction
    Li, Anlong
    Meng, Yiping
    Wang, Pingxin
    MATHEMATICS, 2024, 12 (13)
  • [3] Similarity-Based Chained Transfer Learning for Energy Forecasting With Big Data
    Tian, Yifang
    Sehovac, Ljubisa
    Grolinger, Katarina
    IEEE ACCESS, 2019, 7 : 139895 - 139908
  • [4] Similarity-based data reduction techniques
    Guo, G
    Wang, H
    Bell, D
    JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2005, 37 (02): : 211 - 232
  • [5] Similarity-based data reduction and classification
    Guo, GD
    Wang, H
    Bell, D
    Liao, ZN
    Monitoring, Security, and Rescue Techniques in Multiagent Systems, 2005, : 227 - 238
  • [6] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [7] PySEF: A python']python library for similarity-based dimensionality reduction
    Passalis, Nikolaos
    Tefas, Anastasios
    KNOWLEDGE-BASED SYSTEMS, 2018, 152 : 186 - 187
  • [8] SIMILARITY-BASED CLUSTERING AND SECURITY ASSURANCE MODEL FOR BIG DATA PROCESSING IN CLOUD ENVIRONMENT
    Parthiban, Krishnamoorthy
    Sujatha, Sundaram
    ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2018, 52 (02): : 175 - 200
  • [9] Data integration by fuzzy similarity-based hierarchical clustering
    Ciaramella, Angelo
    Nardone, Davide
    Staiano, Antonino
    BMC BIOINFORMATICS, 2020, 21 (Suppl 10)
  • [10] Data integration by fuzzy similarity-based hierarchical clustering
    Angelo Ciaramella
    Davide Nardone
    Antonino Staiano
    BMC Bioinformatics, 21