MAP-REDUCE BASED DISTANCE WEIGHTED K-NEAREST NEIGHBOR MACHINE LEARNING ALGORITHM FOR BIG DATA APPLICATIONS

被引:6
|
作者
Gothai, E. [1 ]
Muthukumaran, V. [2 ]
Valarmathi, K. [3 ]
Sathishkumar, V. E. [4 ]
Thillaiarasu, N. [5 ]
Karthikeyan, P. [6 ]
机构
[1] Kongu Engn Coll, Dept Comp Sci & Engn, Erode 638060, Tamil Nadu, India
[2] REVA Univ, Sch Appl Sci, Dept Math, Bangalore 560064, Karnataka, India
[3] Panimalar Engn Coll, Trunk Rd, Bangalore 600123, Karnataka, India
[4] Hanyang Univ, Dept Ind Engn, Seoul, South Korea
[5] REVA Univ, Sch Comp & Informat Technol, Bangalore 560064, Karnataka, India
[6] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
来源
关键词
Machine Learning; Big Data Analytics; MapReduce Programming; k-Nearest Neighbour; Classification; prediction; FRAMEWORK; FILTER;
D O I
10.12694/scpe.v23i4.1987
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3% to 4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.
引用
收藏
页码:129 / 145
页数:17
相关论文
共 50 条
  • [31] Missing Data Filling Algorithm for Big Data-Based Map-Reduce Technology
    Li, Fugui
    Sharma, Ashutosh
    INTERNATIONAL JOURNAL OF E-COLLABORATION, 2022, 18 (02)
  • [32] A Semi-supervised K-nearest Neighbor Algorithm Based on Data Editing
    Xie Yongfang
    Jiang Youwei
    Tang Mingzhu
    2011 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, 2011, : 41 - 45
  • [33] A Hybrid Coupled k-Nearest Neighbor Algorithm on Imbalance Data
    Liu, Chunming
    Cao, Longbing
    Yu, Philip S.
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2011 - 2018
  • [34] A Modified K-Nearest Neighbor Algorithm to Handle Uncertain Data
    Agrawal, Rashmi
    Ram, Babu
    2015 5TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2015,
  • [35] Evidential instance selection for K-nearest neighbor classification of big data
    Gong, Chaoyu
    Su, Zhi-gang
    Wang, Pei-hong
    Wang, Qian
    You, Yang
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 138 : 123 - 144
  • [36] Application of the k-nearest neighbor (k-NN) machine learning algorithm for the identification of colorectal cancer based on microRNAs
    Fajar, Rifaldy
    Kurniastuti, Nana Indri
    Jupri, Prihantini
    Wulandari, Titik
    JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2021, 36 : 54 - 54
  • [37] A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction
    Chen, Yingjun
    Hao, Yongtao
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 80 : 340 - 355
  • [38] GM(1,1)-Based Weighted K-Nearest Neighbor Algorithm for Indoor Localization
    Xiang, Lai
    Xu, Ying
    Cui, Jianhui
    Liu, Yang
    Wang, Ruozhou
    Li, Guofeng
    REMOTE SENSING, 2023, 15 (15)
  • [39] A Digital Predistortion Technique Based on the Dimension Weighted Blind K-Nearest Neighbor Algorithm br
    Jiang, Weiheng
    Duan, Yaoxing
    Li, Mingyu
    Jin, Yi
    Xu, Changzhi
    Li, Li
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (02) : 446 - 454
  • [40] Hybridisation of evolutionary programming and machine learning with k-nearest neighbor estimation
    He, Jingsong
    Yang, Zhenyu
    Yao, Xin
    2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, : 1693 - 1700