MAP-REDUCE BASED DISTANCE WEIGHTED K-NEAREST NEIGHBOR MACHINE LEARNING ALGORITHM FOR BIG DATA APPLICATIONS

被引：6

作者：

Gothai, E. ^{[1
]}

Muthukumaran, V. ^{[2
]}

Valarmathi, K. ^{[3
]}

Sathishkumar, V. E. ^{[4
]}

Thillaiarasu, N. ^{[5
]}

Karthikeyan, P. ^{[6
]}

机构：

[1] Kongu Engn Coll, Dept Comp Sci & Engn, Erode 638060, Tamil Nadu, India

[2] REVA Univ, Sch Appl Sci, Dept Math, Bangalore 560064, Karnataka, India

[3] Panimalar Engn Coll, Trunk Rd, Bangalore 600123, Karnataka, India

[4] Hanyang Univ, Dept Ind Engn, Seoul, South Korea

[5] REVA Univ, Sch Comp & Informat Technol, Bangalore 560064, Karnataka, India

[6] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India

来源：

SCALABLE COMPUTING-PRACTICE AND EXPERIENCE | 2022年 / 23卷 / 04期

关键词：

Machine Learning; Big Data Analytics; MapReduce Programming; k-Nearest Neighbour; Classification; prediction; FRAMEWORK; FILTER;

D O I：

10.12694/scpe.v23i4.1987

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3% to 4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.

引用

页码：129 / 145

页数：17

共 50 条

[31] Missing Data Filling Algorithm for Big Data-Based Map-Reduce Technology
Li, Fugui
Sharma, Ashutosh
INTERNATIONAL JOURNAL OF E-COLLABORATION, 2022, 18 (02)
[32] A Semi-supervised K-nearest Neighbor Algorithm Based on Data Editing
Xie Yongfang
Jiang Youwei
Tang Mingzhu
2011 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, 2011, : 41 - 45
[33] A Hybrid Coupled k-Nearest Neighbor Algorithm on Imbalance Data
Liu, Chunming
Cao, Longbing
Yu, Philip S.
PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2011 - 2018
[34] A Modified K-Nearest Neighbor Algorithm to Handle Uncertain Data
Agrawal, Rashmi
Ram, Babu
2015 5TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2015,
[35] Evidential instance selection for K-nearest neighbor classification of big data
Gong, Chaoyu
Su, Zhi-gang
Wang, Pei-hong
Wang, Qian
You, Yang
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 138 : 123 - 144
[36] Application of the k-nearest neighbor (k-NN) machine learning algorithm for the identification of colorectal cancer based on microRNAs
Fajar, Rifaldy
Kurniastuti, Nana Indri
Jupri, Prihantini
Wulandari, Titik
JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2021, 36 : 54 - 54
[37] A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction
Chen, Yingjun
Hao, Yongtao
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 80 : 340 - 355
[38] GM(1,1)-Based Weighted K-Nearest Neighbor Algorithm for Indoor Localization
Xiang, Lai
Xu, Ying
Cui, Jianhui
Liu, Yang
Wang, Ruozhou
Li, Guofeng
REMOTE SENSING, 2023, 15 (15)
[39] A Digital Predistortion Technique Based on the Dimension Weighted Blind K-Nearest Neighbor Algorithm br
Jiang, Weiheng
Duan, Yaoxing
Li, Mingyu
Jin, Yi
Xu, Changzhi
Li, Li
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (02) : 446 - 454
[40] Hybridisation of evolutionary programming and machine learning with k-nearest neighbor estimation
He, Jingsong
Yang, Zhenyu
Yao, Xin
2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, : 1693 - 1700

← 1 2 3 4 5 →