Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

被引:0
|
作者
Ismkhan, Hassan [1 ]
Izadi, Mohammad [1 ]
机构
[1] Sharif Univ Technol, Fac Comp Engn, Tehran 1458889694, Iran
关键词
Clustering algorithms; Task analysis; Feature extraction; Unsupervised learning; Dimensionality reduction; Transforms; Standards; Big data; dimensionality reduction (DR); high-dimensional data; k-means; nearest neighbor (NN); K-MEANS;
D O I
10.1109/TSMC.2023.3234227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
task can be considered as the most important unsupervised learning algorithms. For about all clustering algorithms, finding the Nearest Neighbors of a point within a certain radius r (NN -r), is a critical task. For a high-dimensional dataset, this task becomes too time consuming. This article proposes a simple dimensionality reduction (DR) technique. For point p in d-dimensional space, it produces point p' in d'-dimensional space, where d' << d. In addition, for any pair of points p and q, and their maps p' and q' in the target space, it is proved that |p, q| > |p', q'| is preserved, where |, | used to denote the Euclidean distance between a pair of points. This property can speed up finding NN -r. For a certain radius r, and a pair of points p and q, whenever |p', q'| > r, then q can not be in NN -r of p. Using this trick, the task of finding the NN -r is speeded up. Then, as a case study, it is applied to accelerate the k-means, one of the most famous unsupervised learning algorithms, where it can automatically determine the d'. The proposed NN -r method and the accelerated k-means are compared with recent state-of-the-arts, and both yield favorable results.
引用
收藏
页码:3880 / 3889
页数:10
相关论文
共 50 条
  • [1] A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-dimensional Visual Data
    Liu, Lingqiao
    Wang, Lei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3008 - 3015
  • [2] Dimensionality reduction for visualizing high-dimensional biological data
    Malepathirana, Tamasha
    Senanayake, Damith
    Vidanaarachchi, Rajith
    Gautam, Vini
    Halgamuge, Saman
    [J]. BIOSYSTEMS, 2022, 220
  • [3] Dimensionality Reduction for Registration of High-Dimensional Data Sets
    Xu, Min
    Chen, Hao
    Varshney, Pramod K.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3041 - 3049
  • [4] Flexible High-Dimensional Unsupervised Learning with Missing Data
    Wei, Yuhong
    Tang, Yang
    McNicholas, Paul D.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 610 - 621
  • [5] Efficient indexing of high-dimensional data through dimensionality reduction
    Goh, CH
    Lim, A
    Ooi, BC
    Tan, KL
    [J]. DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
  • [6] Unsupervised Dimensionality Estimation and Manifold Learning in high-dimensional Spaces by Tensor Voting
    Mordohai, Philippos
    Medioni, Gerard
    [J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 798 - 803
  • [7] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [8] Hybrid Dimensionality Reduction Forest With Pruning for High-Dimensional Data Classification
    Chen, Weihong
    Xu, Yuhong
    Yu, Zhiwen
    Cao, Wenming
    Chen, C. L. Philip
    Han, Guoqiang
    [J]. IEEE ACCESS, 2020, 8 : 40138 - 40150
  • [9] Dependence maps, a dimensionality reduction with dependence distance for high-dimensional data
    Lee, Kichun
    Gray, Alexander
    Kim, Heeyoung
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (03) : 512 - 532
  • [10] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718