Exact Fuzzy k-Nearest Neighbor Classification for Big Datasets

被引:0
|
作者
Maillo, Jesus [1 ]
Luengo, Julian [1 ]
Garcia, Salvador [1 ]
Herrera, Francisco [1 ]
Triguero, Isaac [2 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
[2] Univ Nottingham, Sch Comp Sci, Jubilee Campus, Nottingham NG8 1BB, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k-Nearest Neighbors (Fuzzy-kNN) classifier being among the most successful ones. Fuzzy-kNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Random projections fuzzy k-nearest neighbor(RPFKNN) for big data classification
    Popescu, Mihail
    Keller, James M.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1813 - 1817
  • [2] Fuzzy Monotonic K-Nearest Neighbor Versus Monotonic Fuzzy K-Nearest Neighbor
    Zhu, Hong
    Wang, Xizhao
    Wang, Ran
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (09) : 3501 - 3513
  • [3] Scalable Evidential K-Nearest Neighbor Classification on Big Data
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 226 - 237
  • [4] The distance function effect on k-nearest neighbor classification for medical datasets
    Hu, Li-Yu
    Huang, Min-Wei
    Ke, Shih-Wen
    Tsai, Chih-Fong
    [J]. SPRINGERPLUS, 2016, 5
  • [5] Microarray Data Classification using Fuzzy K-Nearest Neighbor
    Kumar, Mukesh
    Rath, Santanu Ku
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1032 - 1038
  • [6] A FUZZY K-NEAREST NEIGHBOR ALGORITHM
    KELLER, JM
    GRAY, MR
    GIVENS, JA
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1985, 15 (04): : 580 - 585
  • [7] Improved k-nearest neighbor classification
    Wu, YQ
    Ianakiev, K
    Govindaraju, V
    [J]. PATTERN RECOGNITION, 2002, 35 (10) : 2311 - 2318
  • [8] Analysis of the k-nearest neighbor classification
    Li, Jing
    Cheng, Ming
    [J]. INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 : 1911 - 1917
  • [9] Exact bootstrap k-nearest neighbor learners
    Steele, Brian M.
    [J]. MACHINE LEARNING, 2009, 74 (03) : 235 - 255
  • [10] Evidential instance selection for K-nearest neighbor classification of big data
    Gong, Chaoyu
    Su, Zhi-gang
    Wang, Pei-hong
    Wang, Qian
    You, Yang
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 138 : 123 - 144