Accelerating high-dimensional clustering with lossless data reduction

被引:3
|
作者
Qaqish, Bahjat F. [1 ]
O'Brien, Jonathon J. [2 ]
Hibbard, Jonathan C. [1 ]
Clowers, Katie J. [2 ]
机构
[1] Univ North Carolina Chapel Hill, Dept Biostat, Chapel Hill, NC 27599 USA
[2] Harvard Med Sch, Dept Cell Biol, Boston, MA 02115 USA
关键词
CLASS DISCOVERY; GENE;
D O I
10.1093/bioinformatics/btx328
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: For cluster analysis, high-dimensional data are associated with instability, decreased classification accuracy and high-computational burden. The latter challenge can be eliminated as a serious concern. For applications where dimension reduction techniques are not implemented, we propose a temporary transformation which accelerates computations with no loss of information. The algorithm can be applied for any statistical procedure depending only on Euclidean distances and can be implemented sequentially to enable analyses of data that would otherwise exceed memory limitations. Results: The method is easily implemented in common statistical software as a standard preprocessing step. The benefit of our algorithm grows with the dimensionality of the problem and the complexity of the analysis. Consequently, our simple algorithm not only decreases the computation time for routine analyses, it opens the door to performing calculations that may have otherwise been too burdensome to attempt. Availability and implementation: R, Matlab and SAS/IML code for implementing lossless data reduction is freely available in the Appendix. Contact: obrienj@hms.harvard.edu
引用
收藏
页码:2867 / 2872
页数:6
相关论文
共 50 条
  • [1] Accelerating Density-Based Subspace Clustering in High-Dimensional Data
    Prinzbach, Juergen
    Lauer, Tobias
    Kiefer, Nicolas
    [J]. 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 474 - 481
  • [2] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [3] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    [J]. CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [4] Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction
    Kampman, Ilari
    Elomaa, Tapio
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 236 - 246
  • [5] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    [J]. DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11
  • [6] Clustering in high-dimensional data spaces
    Murtagh, FD
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292
  • [7] Compressive Clustering of High-dimensional Data
    Ruta, Andrzej
    Porikli, Fatih
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 380 - 385
  • [8] ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
    Fatehi, Kavan
    Rezvani, Mohsen
    Fateh, Mansoor
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (04) : 1651 - 1663
  • [9] ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
    Kavan Fatehi
    Mohsen Rezvani
    Mansoor Fateh
    [J]. Pattern Analysis and Applications, 2020, 23 : 1651 - 1663
  • [10] An effective clustering scheme for high-dimensional data
    He, Xuansen
    He, Fan
    Fan, Yueping
    Jiang, Lingmin
    Liu, Runzong
    Maalla, Allam
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45001 - 45045