Finding multiple global linear correlations in sparse and noisy data sets

被引:0
|
作者
Zhu, Shunzhi [1 ]
Tang, Liang [2 ]
Li, Tao [2 ]
机构
[1] Xiamen Univ Technol, Sch Comp Sci & Technol, Xiamen 361024, Peoples R China
[2] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Correlation pattern; Subspace clustering; Global linear correlation; Divide and conquer strategy; DCSearch; GPCA ALGORITHM; DIMENSION; SUBSPACES; CLUSTERS;
D O I
10.1016/j.knosys.2013.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding linear correlations is an important research problem with numerous real-world applications. In real-world data sets, linear correlation may not exist in the entire data set. Some linear correlations are only visible in certain data subsets. On one hand, a lot of local correlation clustering algorithms assume that the data points of a linear correlation are locally dense. These methods may miss some global correlations when data points are sparsely distributed. On the other hand, existing global correlation clustering methods may fail when the data set contains a large amount of non-correlated points or the actual correlations are coarse. This paper proposes a simple and fast algorithm DCSearch for finding multiple global linear correlations in a data set. This algorithm is able to find the coarse and global linear correlation in noisy and sparse data sets. By using the classical divide and conquer strategy, it first divides the data set into subsets to reduce the search space, and then recursively searches and prunes the candidate correlations from the subsets. Empirical studies show that DCSearch can efficiently reduce the number of candidate correlations during each iteration. Experimental results on both synthetic and real data sets demonstrate that DCSearch is effective and efficient in finding global linear correlations in sparse and noisy data sets. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:40 / 50
页数:11
相关论文
共 50 条
  • [31] ESTIMATING PERIOD FROM SPARSE, NOISY TIMING DATA
    Quinn, Barry G.
    Clarkson, I. Vaughan L.
    McKilliam, Robby G.
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 193 - 196
  • [32] Adaptive infinite dropout for noisy and sparse data streams
    Ha Nguyen
    Hoang Pham
    Son Nguyen
    Ngo Van Linh
    Khoat Than
    Machine Learning, 2022, 111 : 3025 - 3060
  • [33] Imputing compound activities based on sparse and noisy data
    Whitehead, Thomas
    Irwin, Ben
    Hunt, Peter
    Segall, Matthew
    Conduit, Gareth
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [34] Machine-Learning Methods on Noisy and Sparse Data
    Poulinakis, Konstantinos
    Drikakis, Dimitris
    Kokkinakis, Ioannis W.
    Spottswood, Stephen Michael
    MATHEMATICS, 2023, 11 (01)
  • [35] Sparse and Non-Negative BSS for Noisy Data
    Rapin, Jeremy
    Bobin, Jerome
    Larue, Anthony
    Starck, Jean-Luc
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5620 - 5632
  • [36] Adaptive infinite dropout for noisy and sparse data streams
    Ha Nguyen
    Hoang Pham
    Son Nguyen
    Ngo Van Linh
    Khoat Than
    MACHINE LEARNING, 2022, 111 (08) : 3025 - 3060
  • [37] Feature engineering to cope with noisy data in sparse identification
    Franca, Thayna
    Barbosa Braga, Arthur Martins
    Hultmann Ayala, Helon Vicente
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 188
  • [38] Association Factor for Identifying Linear and Nonlinear Correlations in Noisy Conditions
    Kachouie, Nezamoddin N.
    Deebani, Wejdan
    ENTROPY, 2020, 22 (04)
  • [39] Multiple Recurrence and Finding Patterns in Dense Sets
    Austin, Tim
    DYNAMICS AND ANALYTIC NUMBER THEORY, 2016, : 189 - 257
  • [40] QSAR/QSPR modelling - Finding rules in noisy data?
    Darvas, Ferenc
    Kappe, Oliver
    Schneider, Gisbert
    Wiese, Michael
    Kubinyi, Hugo
    QSAR & COMBINATORIAL SCIENCE, 2006, 25 (10): : 811 - 812