Contrasting Classical and Machine Learning Approaches in the Estimation of Value-Added Scores in Large-Scale Educational Data

被引:8
|
作者
Levy, Jessica [1 ]
Mussack, Dominic [2 ]
Brunner, Martin [3 ]
Keller, Ulrich [1 ]
Cardoso-Leite, Pedro [2 ]
Fischbach, Antoine [1 ]
机构
[1] Univ Luxembourg, Luxembourg Ctr Educ Testing, Esch Sur Alzette, Luxembourg
[2] Univ Luxembourg, Dept Behav & Cognit Sci, Esch Sur Alzette, Luxembourg
[3] Univ Potsdam, Dept Educ, Potsdam, Germany
来源
FRONTIERS IN PSYCHOLOGY | 2020年 / 11卷
关键词
value-added modeling; school effectiveness; machine learning; model comparison; longitudinal data; MODELS; ACCOUNTABILITY; PSYCHOLOGY; PROGRESS;
D O I
10.3389/fpsyg.2020.02190
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
There is no consensus on which statistical model estimates school value-added (VA) most accurately. To date, the two most common statistical models used for the calculation of VA scores are two classical methods: linear regression and multilevel models. These models have the advantage of being relatively transparent and thus understandable for most researchers and practitioners. However, these statistical models are bound to certain assumptions (e.g., linearity) that might limit their prediction accuracy. Machine learning methods, which have yielded spectacular results in numerous fields, may be a valuable alternative to these classical models. Although big data is not new in general, it is relatively new in the realm of social sciences and education. New types of data require new data analytical approaches. Such techniques have already evolved in fields with a long tradition in crunching big data (e.g., gene technology). The objective of the present paper is to competently apply these "imported" techniques to education data, more precisely VA scores, and assess when and how they can extend or replace the classical psychometrics toolbox. The different models include linear and non-linear methods and extend classical models with the most commonly used machine learning methods (i.e., random forest, neural networks, support vector machines, and boosting). We used representative data of 3,026 students in 153 schools who took part in the standardized achievement tests of the Luxembourg School Monitoring Program in grades 1 and 3. Multilevel models outperformed classical linear and polynomial regressions, as well as different machine learning models. However, it could be observed that across all schools, school VA scores from different model types correlated highly. Yet, the percentage of disagreements as compared to multilevel models was not trivial and real-life implications for individual schools may still be dramatic depending on the model type used. Implications of these results and possible ethical concerns regarding the use of machine learning methods for decision-making in education are discussed.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Large-Scale Machine Learning Approaches for Molecular Biophysics
    Ramanathan, Arvind
    Chennubhotla, Chakra S.
    Agarwal, Pratul K.
    Stanley, Christopher B.
    BIOPHYSICAL JOURNAL, 2015, 108 (02) : 370A - 370A
  • [2] Security of NVMe Offloaded Data in Large-Scale Machine Learning
    Krauss, Torsten
    Goetz, Raphael
    Dmitrienko, Alexandra
    COMPUTER SECURITY - ESORICS 2023, PT IV, 2024, 14347 : 143 - 163
  • [3] A machine learning software for large-scale molecular and clinical data
    Pan, L.
    Mikolajczyk, K.
    Dimitrakopoulou-Strauss, A.
    Burger, C.
    Strauss, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2007, 34 : S343 - S343
  • [4] Large-Scale Machine Learning Algorithms for Biomedical Data Science
    Huang, Heng
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 4 - 4
  • [5] Large-Scale Machine Learning and Optimization for Bioinformatics Data Analysis
    Cheng, Jianlin
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [6] Integration of multimodal data for large-scale rapid agricultural land evaluation using machine learning and deep learning approaches
    Li, Liangdan
    Liu, Luo
    Peng, Yiping
    Su, Yingyue
    Hu, Yueming
    Zou, Runyan
    GEODERMA, 2023, 439
  • [7] Hydrothermal liquefaction of biomass to fuels and value-added chemicals: Products applications and challenges to develop large-scale operations
    Beims, Ramon Filipe
    Hu, Yulin
    Shui, Hengfu
    Xu, Chunbao
    BIOMASS & BIOENERGY, 2020, 135
  • [8] A framework for generating large-scale microphone array data for machine learning
    Kujawski, Adam
    Pelling, Art J. R.
    Jekosch, Simon
    Sarradj, Ennes
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 31211 - 31231
  • [9] A framework for generating large-scale microphone array data for machine learning
    Adam Kujawski
    Art J. R. Pelling
    Simon Jekosch
    Ennes Sarradj
    Multimedia Tools and Applications, 2024, 83 : 31211 - 31231
  • [10] An online incremental learning support vector machine for large-scale data
    Jun Zheng
    Furao Shen
    Hongjun Fan
    Jinxi Zhao
    Neural Computing and Applications, 2013, 22 : 1023 - 1035