UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM

被引:0
|
作者
Zheng, Xin [1 ]
Huang, Li [2 ]
机构
[1] Jiangxi Univ Technol, Artificial Intelligence Dept, 115 Ziyang Ave, Nanchang 330098, Peoples R China
[2] Jiangxi Univ Technol, Informat Engn Coll, 115 Ziyang Ave, Nanchang 330098, Peoples R China
关键词
Improved RF algorithm; Unbalanced data; Classification recognition; DIAGNOSIS;
D O I
10.24507/ijicic.20.02.575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data analytics has developed rapidly in recent years and data mining has been a positive driver for development in all areas, but data in many areas is grossly unbalanced, and there are still many limitations to current research on classifying big data. To solve this problem, the study uses the K -means algorithm based on class distinction to approximately reduce the dimensionality of the data, and the untracked Kalman filter (UKF) algorithm with an adaptive traceless Kalman filter (Sage-Husa) to reduce the noise of the data. The noise -reduced and dimension -reduced data were obtained to improve the random forest algorithm (K-U-S-H-RF). However, during the study of classifying low -dimensional unbalanced data using K-S-H-RF, it was found that the random forest algorithm did not take account of the actual step-by-step of the data set and was not effective in classifying the data. For this reason, the study introduced cost sensitivity, cost error calculation for decision trees as well as voting. Random forest is parallelized with MapReduce idea to achieve optimum of K-S-H-RF. Then the study constructs an imbalanced big data classification model based on improved random forests. The model can effectively classify unbalanced big data and provide a new path for big data application in more fields, which has a positive effect on the development of the big data era.
引用
收藏
页码:575 / 590
页数:16
相关论文
共 50 条
  • [21] Gramian matrix data collection-based random forest classification for predictive analytics with big data
    Kumar, S. Arun
    Venkatesulu, M.
    SOFT COMPUTING, 2019, 23 (18) : 8621 - 8631
  • [22] Research of the Improved Adaboost Algorithm Based on Unbalanced Data
    Shang Fuhua
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2014, 14 (05): : 14 - 19
  • [23] Research of the Improved Adaboost Algorithm Based on Unbalanced Data
    Shang Fuhua
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (04): : 58 - 63
  • [24] Classification for Unbalanced Dataset by an Improved KNN Algorithm Based on Weight
    Wang, Chao-Xue
    Dong, Li-Li
    Pan, Zheng-Mao
    Zhang, Tao
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (11B): : 4983 - 4988
  • [25] Improved Random Forest for Classification
    Paul, Angshuman
    Mukherjee, Dipti Prasad
    Das, Prasun
    Gangopadhyay, Abhinandan
    Chintha, Appa Rao
    Kundu, Saurabh
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 4012 - 4024
  • [26] Application of Data Denoising and Classification Algorithm Based on RPCA and Multigroup Random Walk Random Forest in Engineering
    Wang, Renchao
    Wang, Yanlei
    Ma, Yuming
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [27] Random forest Algorithm for the Classification of Spectral Data of Astronomical Objects
    Solorio-Ramirez, Jose-Luis
    Jimenez-Cruz, Raul
    Villuendas-Rey, Yenny
    Yanez-Marquez, Cornelio
    ALGORITHMS, 2023, 16 (06)
  • [28] An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features
    Zhang, Ying
    Song, Bin
    Zhang, Yue
    Chen, Sijia
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2017, 2017, 10393 : 642 - 651
  • [29] An Improved Random Forest Algorithm Based on Attribute Compatibility
    Liu, Yu
    Liu, Lu
    Gao, Yin
    Yang, Liu
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2558 - 2561
  • [30] Research on the Classification of High Dimensional Imbalanced Data based on the Optimization of Random Forest Algorithm
    Ma Xiaojuan
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING AND TECHNOLOGY (BDET 2018), 2018, : 60 - 67