UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM

被引:0
|
作者
Zheng, Xin [1 ]
Huang, Li [2 ]
机构
[1] Jiangxi Univ Technol, Artificial Intelligence Dept, 115 Ziyang Ave, Nanchang 330098, Peoples R China
[2] Jiangxi Univ Technol, Informat Engn Coll, 115 Ziyang Ave, Nanchang 330098, Peoples R China
关键词
Improved RF algorithm; Unbalanced data; Classification recognition; DIAGNOSIS;
D O I
10.24507/ijicic.20.02.575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data analytics has developed rapidly in recent years and data mining has been a positive driver for development in all areas, but data in many areas is grossly unbalanced, and there are still many limitations to current research on classifying big data. To solve this problem, the study uses the K -means algorithm based on class distinction to approximately reduce the dimensionality of the data, and the untracked Kalman filter (UKF) algorithm with an adaptive traceless Kalman filter (Sage-Husa) to reduce the noise of the data. The noise -reduced and dimension -reduced data were obtained to improve the random forest algorithm (K-U-S-H-RF). However, during the study of classifying low -dimensional unbalanced data using K-S-H-RF, it was found that the random forest algorithm did not take account of the actual step-by-step of the data set and was not effective in classifying the data. For this reason, the study introduced cost sensitivity, cost error calculation for decision trees as well as voting. Random forest is parallelized with MapReduce idea to achieve optimum of K-S-H-RF. Then the study constructs an imbalanced big data classification model based on improved random forests. The model can effectively classify unbalanced big data and provide a new path for big data application in more fields, which has a positive effect on the development of the big data era.
引用
收藏
页码:575 / 590
页数:16
相关论文
共 50 条
  • [1] Image Classification Based on Improved Random Forest Algorithm
    Man, Weishi
    Ji, Yuanyuan
    Zhang, Zhiyu
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 346 - 350
  • [2] Random forest algorithm in big data environment
    Liu, Yingchun
    Computer Modelling and New Technologies, 2014, 18 (12): : 147 - 151
  • [3] A Classification Method of Chronic Diseases based on Improved Random Forest Algorithm
    Yang, Ping
    Wang, Dan
    Shi, Zhiqiang
    Fu, Lihua
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 22 - 22
  • [4] Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest
    Bhagat, Reshma C.
    Patil, Sachin S.
    2015 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2015, : 403 - 408
  • [5] Point Clouds Classification Algorithm Based on Cloth Filtering Algorithm and Improved Random Forest
    Xue Doudou
    Cheng Yinglei
    Shi Xiaosong
    Qin Xianxiang
    Wen Pei
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (22)
  • [6] Random forest algorithm for classification of multiwavelength data
    Gao, Dan
    Zhang, Yan-Xia
    Zhao, Yong-Heng
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2009, 9 (02) : 220 - 226
  • [7] Random forest algorithm for classification of multiwavelength data
    Dan Gao1
    2 Graduate University of Chinese Academy of Sciences
    ResearchinAstronomyandAstrophysics, 2009, 9 (02) : 220 - 226
  • [8] Big Data Cleaning Based on Improved CLOF and Random Forest for Distribution Networks
    Liu, Jie
    Cao, Yijia
    Li, Yong
    Guo, Yixiu
    Deng, Wei
    CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, 2024, 10 (06): : 2528 - 2538
  • [9] Credit Data Classification Based on Ant Colony Algorithm and Random Forest
    Feng, Ruiqi
    Han, Lu
    Chen, Muzi
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 144 - 149
  • [10] Random Forest Algorithm with derived Geographical Layers for Improved Classification of Remote Sensing Data
    Kumar, Uttam
    Dasgupta, Anindita
    Mukhopadhyay, Chiranjit
    Ramachandra, T. V.
    2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,