A depth-based nearest neighbor algorithm for high-dimensional data classification

被引：0

作者：

Harikumar S. ^{[1
]}

Aravindakshan Savithri A. ^{[1
]}

Kaimal R. ^{[1
]}

机构：

[1] Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri

来源：

Turkish Journal of Electrical Engineering and Computer Sciences | 2019年 / 27卷 / 06期

关键词：

Classification; Data-depth; Information gain; Nearest neighbor; Subspace-clustering;

D O I：

10.3906/ELK-1807-163

中图分类号：

学科分类号：

摘要：

Nearest neighbor algorithms like k-nearest neighbors (kNN) are fundamental supervised learning techniques to classify a query instance based on class labels of its neighbors. However, quite often, huge volumes of datasets are not fully labeled and the unknown probability distribution of the instances may be uneven. Moreover, kNN suffers from challenges like curse of dimensionality, setting the optimal number of neighbors, and scalability for high-dimensional data. To overcome these challenges, we propose an improvised approach of classification via depth representation of subspace clusters formed from high-dimensional data. We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data; ii) extracting relevant features, and iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function. We propose an improvised classification algorithm using a depth-based representation of clusters, to improve performance in terms of execution time and accuracy. Experimentation on real-world datasets reveals that proposed approach is at least two orders of magnitude faster for high-dimensional dataset and is at least as accurate as traditional kNN. © TÜBİTAK.

引用

页码：4082 / 4101

页数：19

共 50 条

[1] A depth-based nearest neighbor algorithm for high-dimensional data classification
Harikumar, Sandhya
Aravindakshan Savithri, Akhil
Kaimal, Ramachandra
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101
[2] An algorithm for incremental nearest neighbor search in high-dimensional data spaces
Lee, DH
Lee, HD
Choi, IH
Kim, HJ
HUMAN SOCIETY AND THE INTERNET, PROCEEDINGS: INTERNET-RELATED SOCIO-ECONOMIC ISSUES, 2001, 2105 : 436 - 453
[3] A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
李文法
Wang Gongming
Ma Nan
Liu Hongzhe
High Technology Letters, 2016, 22 (03) : 241 - 247
[4] Redefining nearest neighbor classification in high-dimensional settings
Lopez, Julio
Maldonado, Sebastian
PATTERN RECOGNITION LETTERS, 2018, 110 : 36 - 43
[5] A Heterogeneous High-Dimensional Approximate Nearest Neighbor Algorithm
Dubiner, Moshe
IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (10) : 6646 - 6658
[6] High-dimensional shared nearest neighbor clustering algorithm
Yin, J
Fan, XL
Chen, YQ
Ren, JT
FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 494 - 502
[7] C-approximate nearest neighbor query algorithm based on learning for high-dimensional data
Yuan, Pei-Sen
Sha, Chao-Feng
Wang, Xiao-Ling
Zhou, Ao-Ying
Ruan Jian Xue Bao/Journal of Software, 2012, 23 (08): : 2018 - 2031
[8] A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach
Chen, Hao
Xia, Yin
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 719 - 731
[9] Fuzzy nearest neighbor clustering of high-dimensional data
Wang, HB
Yu, YQ
Zhou, DR
Meng, B
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
[10] Nearest neighbor search on vertically partitioned high-dimensional data
Dellis, E
Seeger, B
Vlachou, A
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2005, 3589 : 243 - 253

← 1 2 3 4 5 →