Evidential instance selection for K-nearest neighbor classification of big data

被引:18
|
作者
Gong, Chaoyu [1 ]
Su, Zhi-gang [1 ]
Wang, Pei-hong [1 ]
Wang, Qian [2 ]
You, Yang [3 ]
机构
[1] Southeast Univ, Sch Energy & Environm, Nanjing 210096, Peoples R China
[2] Jiangsu Univ Sci & Technol, Sch Energy & Power, Zhenjiang 212003, Jiangsu, Peoples R China
[3] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore
基金
中国国家自然科学基金;
关键词
Evidence theory; Information fusion; Instance selection; Apache Spark; Big data; LARGE DATA SETS; PROTOTYPE REDUCTION; MAPREDUCE; ALGORITHM; CONDENSATION; PERFORMANCE;
D O I
10.1016/j.ijar.2021.08.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many instance selection algorithms have been introduced to reduce the high storage requirements and computation complexity of K-nearest neighbor (K-NN) classification rules. However, the information provided by the neighbors of one instance was still not completely utilized in many studies. The information is usually in the form of a quantitative metric for determining whether an instance can be selected. Thus, many instances may have the same quality, which confuses the selection results. In addition, the proposed metrics are simply added without deeper fusion and the information loss has further negative effects. To address these issues, we propose a new instance selection algorithm for K-NN rules in the evidence theory framework called evidential instance selection (EIS). The basic idea is that all neighbors of every instance first provide distinct items of evidence regarding the estimated value of the label (called the estimation label) for each instance. After fusing the items of evidence and computing the conflicts among them, instances with higher conflict are considered more likely to be near the class boundaries. Finally, the selection of boundary instances is formalized as solving an optimal problem, where the objective function considers both the reduction rate and classification accuracy. When dealing with big data sets, EIS is enhanced as a distributed and parallel version called EIS-AS by applying Apache Spark to alleviate the computational bottleneck. We tested EIS and EIS-AS with 30 small data sets and six big data sets, respectively, which contained up to 11 million instances. The experimental results showed that EIS performed well at simplifying the raw training data and EIS-AS could cope with big data sets in an appropriate manner. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:123 / 144
页数:22
相关论文
共 50 条
  • [1] Scalable Evidential K-Nearest Neighbor Classification on Big Data
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 226 - 237
  • [2] Evidential classification of incomplete instance based on K-nearest centroid neighbor
    Ma, Zong-Fang
    Liu, Zhe
    Luo, Chan
    Song, Lin
    [J]. Journal of Intelligent and Fuzzy Systems, 2021, 41 (06): : 7101 - 7115
  • [3] Evidential classification of incomplete instance based on K-nearest centroid neighbor
    Ma, Zong-fang
    Liu, Zhe
    Luo, Chan
    Song, Lin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (06) : 7101 - 7115
  • [4] Joint Evidential K-Nearest Neighbor Classification
    Gong, Chaoyu
    Li, Yongbin
    Liu, Yong
    Wang, Pei-hong
    You, Yang
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2113 - 2126
  • [5] An instance selection algorithm for fuzzy K-nearest neighbor
    Zhai, Junhai
    Qi, Jiaxing
    Zhang, Sufang
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (01) : 521 - 533
  • [6] An Evidential K-Nearest Neighbor Classification Method with Weighted Attributes
    Jiao, Lianmeng
    Pan, Quan
    Feng, Xiaoxue
    Yang, Feng
    [J]. 2013 16TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2013, : 145 - 150
  • [7] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    [J]. 2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [8] Evidential Editing K-Nearest Neighbor Classifier
    Jiao, Lianmeng
    Denoeux, Thierry
    Pan, Quan
    [J]. SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2015, 2015, 9161 : 461 - 471
  • [9] A quick evidential classification algorithm based on K-nearest neighbor rule
    Wang, Z
    Hu, WD
    Yu, WX
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 3248 - 3252
  • [10] Random projections fuzzy k-nearest neighbor(RPFKNN) for big data classification
    Popescu, Mihail
    Keller, James M.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1813 - 1817