Finding multiple global linear correlations in sparse and noisy data sets

被引:0
|
作者
Zhu, Shunzhi [1 ]
Tang, Liang [2 ]
Li, Tao [2 ]
机构
[1] Xiamen Univ Technol, Sch Comp Sci & Technol, Xiamen 361024, Peoples R China
[2] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Correlation pattern; Subspace clustering; Global linear correlation; Divide and conquer strategy; DCSearch; GPCA ALGORITHM; DIMENSION; SUBSPACES; CLUSTERS;
D O I
10.1016/j.knosys.2013.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding linear correlations is an important research problem with numerous real-world applications. In real-world data sets, linear correlation may not exist in the entire data set. Some linear correlations are only visible in certain data subsets. On one hand, a lot of local correlation clustering algorithms assume that the data points of a linear correlation are locally dense. These methods may miss some global correlations when data points are sparsely distributed. On the other hand, existing global correlation clustering methods may fail when the data set contains a large amount of non-correlated points or the actual correlations are coarse. This paper proposes a simple and fast algorithm DCSearch for finding multiple global linear correlations in a data set. This algorithm is able to find the coarse and global linear correlation in noisy and sparse data sets. By using the classical divide and conquer strategy, it first divides the data set into subsets to reduce the search space, and then recursively searches and prunes the candidate correlations from the subsets. Empirical studies show that DCSearch can efficiently reduce the number of candidate correlations during each iteration. Experimental results on both synthetic and real data sets demonstrate that DCSearch is effective and efficient in finding global linear correlations in sparse and noisy data sets. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:40 / 50
页数:11
相关论文
共 50 条
  • [41] Fuzzy-Clustering-Based Discriminant Method of Multiple Quadric Surfaces for Noisy and Sparse Range Data
    Kawano, Hideaki
    Maeda, Hiroshi
    Ikoma, Norikazu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2010, 14 (02) : 160 - 166
  • [42] An Analysis of Boosted Linear Classifiers on Noisy Data with Applications to Multiple-Instance Learning
    Liu, Rui
    Ray, Soumya
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 287 - 296
  • [43] CALCULATING LYAPUNOV EXPONENTS FOR SHORT AND OR NOISY DATA SETS
    BROWN, R
    PHYSICAL REVIEW E, 1993, 47 (06): : 3962 - 3969
  • [44] Treating noisy data sets with relaxed genetic programming
    Da Costa, Luis
    Landry, Jacques-Andre
    Levasseur, Yan
    ARTIFICIAL EVOLUTION, 2008, 4926 : 1 - 12
  • [45] Credal Decision Trees to Classify Noisy Data Sets
    Mantas, Carlos J.
    Abellan, Joaquin
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, HAIS 2014, 2014, 8480 : 689 - 696
  • [46] Mesh modelling for sparse image data sets
    Coleman, SA
    Scotney, BW
    2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 2321 - 2324
  • [47] Default clustering from sparse data sets
    Velcin, J
    Canascia, JG
    SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, PROCEEDINGS, 2005, 3571 : 968 - 979
  • [48] Sparse regression for large data sets with outliers
    Bottmer, Lea
    Croux, Christophe
    Wilms, Ines
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 297 (02) : 782 - 794
  • [49] Collaborative filtering algorithm for sparse data sets
    Dong, Li
    Xing, Chunxiao
    Wang, Kehong
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (10): : 1725 - 1728
  • [50] Recovering Noisy -Pseudo -Sparse Signals from Linear Measurements via
    Zhang, Hang
    Abdi, Afshin
    Fekri, Faramarz
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 1154 - 1159