Projection-Based Partitioning for Large, High-Dimensional Datasets

被引:2
|
作者
Ilies, Iulian [1 ]
Wilhelm, Adalbert [1 ]
机构
[1] Jacobs Univ Bremen, Sch Humanities & Social Sci, D-28759 Bremen, Germany
关键词
Average shifted histogram; Excess mass; Local minimum; Principal component analysis; Sampling; SINGLE LINKAGE; CLUSTER TREE; DENSITY;
D O I
10.1198/jcgs.2010.08038
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Recent work in the field of cluster analysis has focused on designing algorithms that address the issue of ever growing datasets and provide meaningful solutions for data with high cardinality and/or dimensionality, under the natural restriction of limited resources. Within this line of research, we propose a method drawing on the principles of projection pursuit and grid partitioning, which focuses on reducing computational requirements for large datasets without loss of performance. To achieve that, we rely on procedures such as sampling of objects, feature selection, and quick density estimation using histograms. The present algorithm searches for low-density points in potentially favorable one-dimensional projections, and partitions the data by a hyperplane passing through the best split point found. Tests on synthetic and reference data indicate that our method can quickly and efficiently recover clusters that are distinguishable from the remaining objects on at least one direction; linearly nonseparable clusters are usually subdivided. The solution is robust in the presence of noise in moderate levels, and when the clusters are partially overlapping. An implementation of the algorithm is available online, as supplemental material.
引用
收藏
页码:474 / 492
页数:19
相关论文
共 50 条
  • [1] Projection-based High-dimensional Sign Test
    Hui Chen
    Chang Liang Zou
    Run Ze Li
    [J]. Acta Mathematica Sinica, English Series, 2022, 38 : 683 - 708
  • [2] Projection-based High-dimensional Sign Test
    Hui CHEN
    Chang Liang ZOU
    Run Ze LI
    [J]. Acta Mathematica Sinica,English Series, 2022, (04) : 683 - 708
  • [3] Projection-based High-dimensional Sign Test
    Chen, Hui
    Zou, Chang Liang
    Li, Run Ze
    [J]. ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2022, 38 (04) : 683 - 708
  • [4] PROJECTION-BASED INFERENCE FOR HIGH-DIMENSIONAL LINEAR MODELS
    Yi, Sangyoon
    Zhang, Xianyang
    [J]. STATISTICA SINICA, 2022, 32 (02) : 915 - 937
  • [5] Projection-based techniques for high-dimensional optimal transport problems
    Zhang, Jingyi
    Ma, Ping
    Zhong, Wenxuan
    Meng, Cheng
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2023, 15 (02)
  • [6] Projection-based and cross-validated estimation in high-dimensional Cox model
    Zhang, Haixiang
    Huang, Jian
    Sun, Liuquan
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2022, 49 (01) : 353 - 372
  • [7] A projection method for the visualization of high-dimensional biomedical datasets
    Mandelzweig, M
    Demko, AB
    Dolenko, B
    Somorjai, RL
    Pizzi, NJ
    [J]. CCECE 2003: CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, PROCEEDINGS: TOWARD A CARING AND HUMANE TECHNOLOGY, 2003, : 1453 - 1456
  • [8] Icon-based visualization of large high-dimensional datasets
    Chen, P
    Hu, CY
    Ding, W
    Lynn, H
    Simon, Y
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 505 - 508
  • [9] INTEGRATIVE EXPLORATION OF LARGE HIGH-DIMENSIONAL DATASETS
    Pardy, Christopher
    Galbraith, Sally
    Wilson, Susan R.
    [J]. ANNALS OF APPLIED STATISTICS, 2018, 12 (01): : 178 - 199
  • [10] A projection-based conditional dependence measure with applications to high-dimensional undirected graphical models
    Fan, Jianqing
    Feng, Yang
    Xia, Lucy
    [J]. JOURNAL OF ECONOMETRICS, 2020, 218 (01) : 119 - 139