MapReduce based distributed improved random forest model for graduates career classification

被引:0
|
作者
Qiao F. [1 ]
Ge Y. [1 ]
Kong W. [1 ]
机构
[1] CIMS Research Center, College of Electronics and Information Engineering, Tongji University, Shanghai
来源
| 1600年 / Systems Engineering Society of China卷 / 37期
基金
中国国家自然科学基金;
关键词
Big data processing; Data classification model; Machine learning; MapReduce;
D O I
10.12011/1000-6788(2017)05-1383-10
中图分类号
学科分类号
摘要
Educational data mining is a research area of using data mining technology in education industry. In the research of EDM, data mining technology is used to modeling dataset samples in the field of education, which aims to study and forecast the testing data set with the help of effective statistical machine learning models. Machine learning models with distributed computing frameworks in the EDM can meet the needs of large-scale data processing meanwhile provide tailored data recommendation and then support decision-making in the future. Based on this background, this study first put all kinds of data models into the data training and predicting for simulation, propose an improved model to ameliorate the classification performance of the data model by adjusting the data model and by using an improved algorithm based on a new equation of information gain when calculating the optimal feature to split. Based on the best-performance data model in previous study combined with the application background of the "big data" era, we proposed a new random forest algorithm model focusing on giving classification to largescale datasets based on distributed computing framework called MapReduce. By using the MapReduce, we design and realize a new system to meet this requirement. In this system, the model that has been trained can be serialized and deserialization between local disks and the distributed file system. © 2017, Editorial Board of Journal of Systems Engineering Society of China. All right reserved.
引用
收藏
页码:1383 / 1392
页数:9
相关论文
共 13 条
  • [1] Penna-Ayala A., Educational data mining: A survey and a data mining-based analysis of recent works, Expert Systems with Applications, 41, 4, pp. 1432-1462, (2014)
  • [2] Baker R.S., Educational data mining: An advance for intelligent systems in education, Intelligent Systems IEEE, 29, 3, pp. 78-82, (2014)
  • [3] Gamulin J., Gamulin O., Kermek D., Comparing classification models in the final exam performance prediction, Information and Communication Technology, Electronics and Microelectronics (MIPRO), 37th International Convention on, pp. 663-668, (2014)
  • [4] Bhardwaj B.K., Pal S., Data mining: A prediction for performance improvement using classification, World of Computer Science & Information Technology Journal, 2, 4, (2012)
  • [5] Guruler H., Istanbullu A., Karahasan M., A new student performance analysing system using knowledge discovery in higher educational databases, Computers & Education, 55, 1, pp. 247-254, (2010)
  • [6] Ade R., Deshmukh P.R., An incremental ensemble of classifiers as a technique for prediction of student's career choice, 2014 First International Conference on Networks & Soft Computing (ICNSC), (2014)
  • [7] Tan T., Tan L., Study on personalization recommendation system based on recruitment information, Procedia Engineering, 29, pp. 780-784, (2012)
  • [8] Mishra T., Kumar D., Gupta S., Mining students' data for prediction performance, Fourth International Conference on Advanced Computing & Communication Technologies (ACCT), pp. 255-262, (2014)
  • [9] Ganesh S.H., Christy A.J., Applications of educational data mining: A survey, International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1-6, (2015)
  • [10] Parmar K., Vaghela D., Sharma P., Performance prediction of students using distributed data mining, International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1-5, (2015)