Exact Fuzzy k-Nearest Neighbor Classification for Big Datasets

被引:0
|
作者
Maillo, Jesus [1 ]
Luengo, Julian [1 ]
Garcia, Salvador [1 ]
Herrera, Francisco [1 ]
Triguero, Isaac [2 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
[2] Univ Nottingham, Sch Comp Sci, Jubilee Campus, Nottingham NG8 1BB, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k-Nearest Neighbors (Fuzzy-kNN) classifier being among the most successful ones. Fuzzy-kNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Improving K-Nearest Neighbor Efficacy for FarsiText Classification
    Elahimanesh, Mohammad Hossein
    BehrouzMinaei-Bidgoli
    Malekinezhad, Hossein
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1618 - 1621
  • [32] k-Nearest Neighbor Classification Using Dissimilarity Increments
    Aidos, Helena
    Fred, Ana
    [J]. IMAGE ANALYSIS AND RECOGNITION, PT I, 2012, 7324 : 27 - 33
  • [33] Novel text classification based on K-nearest neighbor
    Yu, Xiao-Peng
    Yu, Xiao-Gao
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3425 - +
  • [34] K-Nearest Neighbor Classification for Glass Identification Problem
    Aldayel, Mashael S.
    [J]. 2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII), 2012,
  • [35] Style linear k-nearest neighbor classification method
    Zhang, Jin
    Bian, Zekang
    Wang, Shitong
    [J]. APPLIED SOFT COMPUTING, 2024, 150
  • [36] A Review of a Text Classification Technique: K-Nearest Neighbor
    Zhou, R. S.
    Wang, Z. J.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 453 - 455
  • [37] Feature Based Classification of Nuclear Receptors and Their Subfamilies Using Fuzzy K-Nearest Neighbor
    Tiwari, Arvind Kumar
    Srivastava, Rajeev
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 24 - 28
  • [38] An interval type-2 fuzzy K-nearest neighbor
    Rhee, FCH
    Hwang, C
    [J]. PROCEEDINGS OF THE 12TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1 AND 2, 2003, : 802 - 807
  • [39] FUZZY K-NEAREST NEIGHBOR CLASSIFIERS FOR VENTRICULAR ARRHYTHMIA DETECTION
    CABELLO, D
    BARRO, S
    SALCEDA, JM
    RUIZ, R
    MIRA, J
    [J]. INTERNATIONAL JOURNAL OF BIO-MEDICAL COMPUTING, 1991, 27 (02): : 77 - 93
  • [40] A parameter independent fuzzy weighted k-Nearest neighbor classifier
    Biswas, Nimagna
    Chakraborty, Saurajit
    Mullick, Sankha Subhra
    Das, Swagatam
    [J]. PATTERN RECOGNITION LETTERS, 2018, 101 : 80 - 87