Evidential instance selection for K-nearest neighbor classification of big data

被引：18

作者：

Gong, Chaoyu ^{[1
]}

Su, Zhi-gang ^{[1
]}

Wang, Pei-hong ^{[1
]}

Wang, Qian ^{[2
]}

You, Yang ^{[3
]}

机构：

[1] Southeast Univ, Sch Energy & Environm, Nanjing 210096, Peoples R China

[2] Jiangsu Univ Sci & Technol, Sch Energy & Power, Zhenjiang 212003, Jiangsu, Peoples R China

[3] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore

来源：

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING | 2021年 / 138卷

基金：

中国国家自然科学基金;

关键词：

Evidence theory; Information fusion; Instance selection; Apache Spark; Big data; LARGE DATA SETS; PROTOTYPE REDUCTION; MAPREDUCE; ALGORITHM; CONDENSATION; PERFORMANCE;

D O I：

10.1016/j.ijar.2021.08.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many instance selection algorithms have been introduced to reduce the high storage requirements and computation complexity of K-nearest neighbor (K-NN) classification rules. However, the information provided by the neighbors of one instance was still not completely utilized in many studies. The information is usually in the form of a quantitative metric for determining whether an instance can be selected. Thus, many instances may have the same quality, which confuses the selection results. In addition, the proposed metrics are simply added without deeper fusion and the information loss has further negative effects. To address these issues, we propose a new instance selection algorithm for K-NN rules in the evidence theory framework called evidential instance selection (EIS). The basic idea is that all neighbors of every instance first provide distinct items of evidence regarding the estimated value of the label (called the estimation label) for each instance. After fusing the items of evidence and computing the conflicts among them, instances with higher conflict are considered more likely to be near the class boundaries. Finally, the selection of boundary instances is formalized as solving an optimal problem, where the objective function considers both the reduction rate and classification accuracy. When dealing with big data sets, EIS is enhanced as a distributed and parallel version called EIS-AS by applying Apache Spark to alleviate the computational bottleneck. We tested EIS and EIS-AS with 30 small data sets and six big data sets, respectively, which contained up to 11 million instances. The experimental results showed that EIS performed well at simplifying the raw training data and EIS-AS could cope with big data sets in an appropriate manner. (C) 2021 Elsevier Inc. All rights reserved.

引用

页码：123 / 144

页数：22

共 50 条

[1] Scalable Evidential K-Nearest Neighbor Classification on Big Data
Gong, Chaoyu
Demmel, Jim
You, Yang
[J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 226 - 237
[2] Evidential classification of incomplete instance based on K-nearest centroid neighbor
Ma, Zong-Fang
Liu, Zhe
Luo, Chan
Song, Lin
[J]. Journal of Intelligent and Fuzzy Systems, 2021, 41 (06): : 7101 - 7115
[3] Evidential classification of incomplete instance based on K-nearest centroid neighbor
Ma, Zong-fang
Liu, Zhe
Luo, Chan
Song, Lin
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (06) : 7101 - 7115
[4] Joint Evidential K-Nearest Neighbor Classification
Gong, Chaoyu
Li, Yongbin
Liu, Yong
Wang, Pei-hong
You, Yang
[J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2113 - 2126
[5] An instance selection algorithm for fuzzy K-nearest neighbor
Zhai, Junhai
Qi, Jiaxing
Zhang, Sufang
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (01) : 521 - 533
[6] An Evidential K-Nearest Neighbor Classification Method with Weighted Attributes
Jiao, Lianmeng
Pan, Quan
Feng, Xiaoxue
Yang, Feng
[J]. 2013 16TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2013, : 145 - 150
[7] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
Okfalisa
Mustakim
Gazalba, Ikbal
Reza, Nurul Gayatri Indah
[J]. 2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
[8] Evidential Editing K-Nearest Neighbor Classifier
Jiao, Lianmeng
Denoeux, Thierry
Pan, Quan
[J]. SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2015, 2015, 9161 : 461 - 471
[9] A quick evidential classification algorithm based on K-nearest neighbor rule
Wang, Z
Hu, WD
Yu, WX
[J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 3248 - 3252
[10] Random projections fuzzy k-nearest neighbor(RPFKNN) for big data classification
Popescu, Mihail
Keller, James M.
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1813 - 1817

← 1 2 3 4 5 →