MapReduce based distributed improved random forest model for graduates career classification

被引:0
|
作者
机构
[1] Qiao, Fei
[2] Ge, Yanhao
[3] Kong, Weichang
来源
| 1600年 / Systems Engineering Society of China卷 / 37期
基金
中国国家自然科学基金;
关键词
Distributed computer systems - Statistical tests - Classification (of information) - Machine learning - File organization - Decision trees - Data handling - Data mining - Learning systems;
D O I
10.12011/1000-6788(2017)05-1383-10
中图分类号
学科分类号
摘要
Educational data mining is a research area of using data mining technology in education industry. In the research of EDM, data mining technology is used to modeling dataset samples in the field of education, which aims to study and forecast the testing data set with the help of effective statistical machine learning models. Machine learning models with distributed computing frameworks in the EDM can meet the needs of large-scale data processing meanwhile provide tailored data recommendation and then support decision-making in the future. Based on this background, this study first put all kinds of data models into the data training and predicting for simulation, propose an improved model to ameliorate the classification performance of the data model by adjusting the data model and by using an improved algorithm based on a new equation of information gain when calculating the optimal feature to split. Based on the best-performance data model in previous study combined with the application background of the big data era, we proposed a new random forest algorithm model focusing on giving classification to largescale datasets based on distributed computing framework called MapReduce. By using the MapReduce, we design and realize a new system to meet this requirement. In this system, the model that has been trained can be serialized and deserialization between local disks and the distributed file system. © 2017, Editorial Board of Journal of Systems Engineering Society of China. All right reserved.
引用
收藏
相关论文
共 50 条
  • [1] MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
    Wei Xu
    Vinh Truong Hoang
    [J]. Mobile Networks and Applications, 2021, 26 : 191 - 199
  • [2] MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
    Xu, Wei
    Hoang, Vinh Truong
    [J]. MOBILE NETWORKS & APPLICATIONS, 2021, 26 (01): : 191 - 199
  • [3] Improved Random Forest for Classification
    Paul, Angshuman
    Mukherjee, Dipti Prasad
    Das, Prasun
    Gangopadhyay, Abhinandan
    Chintha, Appa Rao
    Kundu, Saurabh
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 4012 - 4024
  • [4] Image Classification Based on Improved Random Forest Algorithm
    Man, Weishi
    Ji, Yuanyuan
    Zhang, Zhiyu
    [J]. 2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 346 - 350
  • [5] An Improved Classification Course Based on Mapreduce
    Wang, Haitao
    Liu, Shufeng
    Jia, Zongpu
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (03): : 43 - 52
  • [6] MapReduce-based adaptive random forest algorithm for multi-label classification
    Wu, Qinghua
    Wang, Haihui
    Yan, Xuesong
    Liu, Xiaobo
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (12): : 8239 - 8252
  • [7] MapReduce-based adaptive random forest algorithm for multi-label classification
    Qinghua Wu
    Haihui Wang
    Xuesong Yan
    Xiaobo Liu
    [J]. Neural Computing and Applications, 2019, 31 : 8239 - 8252
  • [8] MapReduce mRMR: Random Forests-Based Email Spam Classification in Distributed Environment
    Vinitha, V. Sri
    Renuka, D. Karthika
    [J]. DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2019, VOL 1, 2020, 1042 : 241 - 253
  • [9] MapReduce Distributed Highly Random Fuzzy Forest for Noisy Big Data
    Mustafic, Faruk
    Xiong, Ning
    Herera, Francisco
    Gallego, Sergio Ramrez
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 560 - 567
  • [10] A kernel-based quantum random forest for improved classification
    Srikumar, Maiyuren
    Hill, Charles D.
    Hollenberg, Lloyd C. L.
    [J]. QUANTUM MACHINE INTELLIGENCE, 2024, 6 (01)