Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

被引：0

作者：

Ismkhan, Hassan ^{[1
]}

Izadi, Mohammad ^{[1
]}

机构：

[1] Sharif Univ Technol, Fac Comp Engn, Tehran 1458889694, Iran

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 06期

关键词：

Clustering algorithms; Task analysis; Feature extraction; Unsupervised learning; Dimensionality reduction; Transforms; Standards; Big data; dimensionality reduction (DR); high-dimensional data; k-means; nearest neighbor (NN); K-MEANS;

D O I：

10.1109/TSMC.2023.3234227

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

task can be considered as the most important unsupervised learning algorithms. For about all clustering algorithms, finding the Nearest Neighbors of a point within a certain radius r (NN -r), is a critical task. For a high-dimensional dataset, this task becomes too time consuming. This article proposes a simple dimensionality reduction (DR) technique. For point p in d-dimensional space, it produces point p' in d'-dimensional space, where d' << d. In addition, for any pair of points p and q, and their maps p' and q' in the target space, it is proved that |p, q| > |p', q'| is preserved, where |, | used to denote the Euclidean distance between a pair of points. This property can speed up finding NN -r. For a certain radius r, and a pair of points p and q, whenever |p', q'| > r, then q can not be in NN -r of p. Using this trick, the task of finding the NN -r is speeded up. Then, as a case study, it is applied to accelerate the k-means, one of the most famous unsupervised learning algorithms, where it can automatically determine the d'. The proposed NN -r method and the accelerated k-means are compared with recent state-of-the-arts, and both yield favorable results.

引用

页码：3880 / 3889

页数：10

共 50 条

[1] A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-dimensional Visual Data
Liu, Lingqiao
Wang, Lei
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3008 - 3015
[2] Dimensionality reduction for visualizing high-dimensional biological data
Malepathirana, Tamasha
Senanayake, Damith
Vidanaarachchi, Rajith
Gautam, Vini
Halgamuge, Saman
[J]. BIOSYSTEMS, 2022, 220
[3] Dimensionality Reduction for Registration of High-Dimensional Data Sets
Xu, Min
Chen, Hao
Varshney, Pramod K.
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3041 - 3049
[4] Flexible High-Dimensional Unsupervised Learning with Missing Data
Wei, Yuhong
Tang, Yang
McNicholas, Paul D.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 610 - 621
[5] Efficient indexing of high-dimensional data through dimensionality reduction
Goh, CH
Lim, A
Ooi, BC
Tan, KL
[J]. DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
[6] Unsupervised Dimensionality Estimation and Manifold Learning in high-dimensional Spaces by Tensor Voting
Mordohai, Philippos
Medioni, Gerard
[J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 798 - 803
[7] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
Guanglei Meng
Biao Wang
Yanming Wu
Mingzhe Zhou
Tiankuo Meng
[J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
[8] Hybrid Dimensionality Reduction Forest With Pruning for High-Dimensional Data Classification
Chen, Weihong
Xu, Yuhong
Yu, Zhiwen
Cao, Wenming
Chen, C. L. Philip
Han, Guoqiang
[J]. IEEE ACCESS, 2020, 8 : 40138 - 40150
[9] Dependence maps, a dimensionality reduction with dependence distance for high-dimensional data
Lee, Kichun
Gray, Alexander
Kim, Heeyoung
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (03) : 512 - 532
[10] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
Meng, Guanglei
Wang, Biao
Wu, Yanming
Zhou, Mingzhe
Meng, Tiankuo
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718

← 1 2 3 4 5 →