Estimation of Locally Relevant Subspace in High-dimensional Data

被引:6
|
作者
Thudumu, Srikanth [1 ]
Branch, Philip [1 ]
Jin, Jiong [1 ]
Singh, Jugdutt [2 ]
机构
[1] Swinburne Univ Technol, Melbourne, Vic, Australia
[2] Sarawak State Govt, Kuching, Malaysia
关键词
High-dimensionality problem; Subspace methods; Outlier Detection; Locally Relevant subspace; The curse of dimensionality problem; OUTLIER DETECTION;
D O I
10.1145/3373017.3373032
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
High-dimensional data is becoming more and more available due to the advent of big data and IoT. Having more dimensions makes data analysis cumbersome increasing the sparsity of data points due to the problem called "curse of dimensionality". To address this problem, global dimensionality reduction techniques are used; however, these techniques are ineffective in revealing hidden outliers from the high-dimensional space. This is due to the behaviour of outliers being hidden in the subspace where they belong; hence, a locally relevant subspace is needed to reveal the hidden outliers. In this paper, we present a technique that identifies a locally relevant subspace and associated low-dimensional subspaces by deriving a final correlation score. To verify the effectiveness of the technique in determining the generalised locally relevant subspace, we evaluate the results with a benchmark data set. Our comparative analysis shows that the technique derived the locally relevant subspace that consists of relevant dimensions presented in benchmark data set.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Analyzing high-dimensional data by subspace validity
    Amir, A
    Kashi, R
    Netanyahu, NS
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 473 - 476
  • [2] A High-dimensional Outlier Detection Algorithm Base on Relevant Subspace
    Gao, Zhipeng
    Zhao, Yang
    Niu, Kun
    Fan, Yidan
    [J]. 2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 1001 - 1008
  • [3] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
  • [4] ESTIMATION OF HIGH-DIMENSIONAL CONNECTIVITY IN FMRI DATA VIA SUBSPACE AUTOREGRESSIVE MODELS
    Ting, Chee-Ming
    Seghouane, Abd-Krim
    Salleh, Sh-Hussain
    [J]. 2016 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2016,
  • [5] Subspace clustering of high-dimensional data: a predictive approach
    Brian McWilliams
    Giovanni Montana
    [J]. Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
  • [6] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
    Nourashrafeddin, S. N.
    Arnold, Dirk V.
    Milios, Evangelos
    [J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
  • [7] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
    Vijendra, Singh
    Laxman, Sahoo
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
  • [8] Density Conscious Subspace Clustering for High-Dimensional Data
    Chu, Yi-Hong
    Huang, Jen-Wei
    Chuang, Kun-Ta
    Yang, De-Nian
    Chen, Ming-Syan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30
  • [9] Subspace clustering of high-dimensional data: a predictive approach
    McWilliams, Brian
    Montana, Giovanni
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772
  • [10] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783