UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM

被引:0
|
作者
Zheng, Xin [1 ]
Huang, Li [2 ]
机构
[1] Jiangxi Univ Technol, Artificial Intelligence Dept, 115 Ziyang Ave, Nanchang 330098, Peoples R China
[2] Jiangxi Univ Technol, Informat Engn Coll, 115 Ziyang Ave, Nanchang 330098, Peoples R China
关键词
Improved RF algorithm; Unbalanced data; Classification recognition; DIAGNOSIS;
D O I
10.24507/ijicic.20.02.575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data analytics has developed rapidly in recent years and data mining has been a positive driver for development in all areas, but data in many areas is grossly unbalanced, and there are still many limitations to current research on classifying big data. To solve this problem, the study uses the K -means algorithm based on class distinction to approximately reduce the dimensionality of the data, and the untracked Kalman filter (UKF) algorithm with an adaptive traceless Kalman filter (Sage-Husa) to reduce the noise of the data. The noise -reduced and dimension -reduced data were obtained to improve the random forest algorithm (K-U-S-H-RF). However, during the study of classifying low -dimensional unbalanced data using K-S-H-RF, it was found that the random forest algorithm did not take account of the actual step-by-step of the data set and was not effective in classifying the data. For this reason, the study introduced cost sensitivity, cost error calculation for decision trees as well as voting. Random forest is parallelized with MapReduce idea to achieve optimum of K-S-H-RF. Then the study constructs an imbalanced big data classification model based on improved random forests. The model can effectively classify unbalanced big data and provide a new path for big data application in more fields, which has a positive effect on the development of the big data era.
引用
收藏
页码:575 / 590
页数:16
相关论文
共 50 条
  • [41] Application of Big Data Unbalanced Classification Algorithm in Credit Risk Analysis of Insurance Companies
    Wu, Xian
    Liu, Huan
    JOURNAL OF MATHEMATICS, 2022, 2022
  • [42] Random forest for big data classification in the internet of things using optimal features
    Lakshmanaprabu, S. K.
    Shankar, K.
    Ilayaraja, M.
    Nasir, Abdul Wahid
    Vijayakumar, V.
    Chilamkurti, Naveen
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2609 - 2618
  • [43] Random forest for big data classification in the internet of things using optimal features
    S. K. Lakshmanaprabu
    K. Shankar
    M. Ilayaraja
    Abdul Wahid Nasir
    V. Vijayakumar
    Naveen Chilamkurti
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 2609 - 2618
  • [44] A kernel-based quantum random forest for improved classification
    Srikumar, Maiyuren
    Hill, Charles D.
    Hollenberg, Lloyd C. L.
    QUANTUM MACHINE INTELLIGENCE, 2024, 6 (01)
  • [45] MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
    Xu, Wei
    Hoang, Vinh Truong
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (01): : 191 - 199
  • [46] MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
    Wei Xu
    Vinh Truong Hoang
    Mobile Networks and Applications, 2021, 26 : 191 - 199
  • [47] A Random Forest Classification Algorithm Based on Dichotomy Rule Fusion
    Xiao, Yueyue
    Huang, Wei
    Wang, Jinsong
    PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 182 - 185
  • [48] THE AIRBORNE HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON THE RANDOM FOREST ALGORITHM
    Wang, Shumin
    Dou, Aixia
    Yuan, Xiaoxiang
    Zhang, Xuehua
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 2280 - 2283
  • [49] A fast classification algorithm for big data based on KNN
    Niu, Kun
    Zhao, Fang
    Zhang, Shubo
    Journal of Applied Sciences, 2013, 13 (12) : 2208 - 2212
  • [50] A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment
    Chen, Jianguo
    Li, Kenli
    Tang, Zhuo
    Bilal, Kashif
    Yu, Shui
    Weng, Chuliang
    Li, Keqin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (04) : 919 - 933