MAP-REDUCE BASED DISTANCE WEIGHTED K-NEAREST NEIGHBOR MACHINE LEARNING ALGORITHM FOR BIG DATA APPLICATIONS

被引:6
|
作者
Gothai, E. [1 ]
Muthukumaran, V. [2 ]
Valarmathi, K. [3 ]
Sathishkumar, V. E. [4 ]
Thillaiarasu, N. [5 ]
Karthikeyan, P. [6 ]
机构
[1] Kongu Engn Coll, Dept Comp Sci & Engn, Erode 638060, Tamil Nadu, India
[2] REVA Univ, Sch Appl Sci, Dept Math, Bangalore 560064, Karnataka, India
[3] Panimalar Engn Coll, Trunk Rd, Bangalore 600123, Karnataka, India
[4] Hanyang Univ, Dept Ind Engn, Seoul, South Korea
[5] REVA Univ, Sch Comp & Informat Technol, Bangalore 560064, Karnataka, India
[6] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
来源
关键词
Machine Learning; Big Data Analytics; MapReduce Programming; k-Nearest Neighbour; Classification; prediction; FRAMEWORK; FILTER;
D O I
10.12694/scpe.v23i4.1987
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3% to 4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.
引用
收藏
页码:129 / 145
页数:17
相关论文
共 50 条
  • [1] Map Reduce by K-Nearest Neighbor Joins
    Bethu, Srikanth
    Babu, B. Sankara
    Rao, S. Govinda
    Florence, R. Aruna
    2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC 2018), 2018, : 222 - 231
  • [2] Offloading decision algorithm based on distance weighted k-nearest neighbor in power internet of things
    Jia J.
    Xu J.
    Cui W.
    Tao F.
    Zhang G.
    Hu C.
    Zhang Z.
    Liu Z.
    International Journal for Engineering Modelling, 2020, 33 (3-4) : 63 - 73
  • [3] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [4] A feature weighted K-nearest neighbor algorithm based on association rules
    Manzali Y.
    Barry K.A.
    Flouchi R.
    Balouki Y.
    Elfar M.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (07) : 2995 - 3008
  • [5] An optimized K-Nearest Neighbor algorithm based on Dynamic Distance approach
    Sadrabadi, Aireza Naser
    Znjirchi, Seyed Mahmood
    Abadi, Habib Zare Ahmad
    Hajimoradi, Ahmad
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [6] Quantum K-nearest neighbor classification algorithm based on Hamming distance
    Jing Li
    Song Lin
    Kai Yu
    Gongde Guo
    Quantum Information Processing, 2022, 21
  • [7] NOTE ON DISTANCE-WEIGHTED K-NEAREST NEIGHBOR RULES
    BAILEY, T
    JAIN, AK
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1978, 8 (04): : 311 - 313
  • [8] Quantum K-nearest neighbor classification algorithm based on Hamming distance
    Li, Jing
    Lin, Song
    Yu, Kai
    Guo, Gongde
    QUANTUM INFORMATION PROCESSING, 2022, 21 (01)
  • [9] Spectral Clustering Approach with K-Nearest Neighbor and Weighted Mahalanobis Distance for Data Mining
    Yin, Lifeng
    Lv, Lei
    Wang, Dingyi
    Qu, Yingwei
    Chen, Huayue
    Deng, Wu
    ELECTRONICS, 2023, 12 (15)
  • [10] FML-kNN: scalable machine learning on Big Data using k-nearest neighbor joins
    Chatzigeorgakidis, Georgios
    Karagiorgou, Sophia
    Athanasiou, Spiros
    Skiadopoulos, Spiros
    JOURNAL OF BIG DATA, 2018, 5 (01)