Contrasting Classical and Machine Learning Approaches in the Estimation of Value-Added Scores in Large-Scale Educational Data

被引:8
|
作者
Levy, Jessica [1 ]
Mussack, Dominic [2 ]
Brunner, Martin [3 ]
Keller, Ulrich [1 ]
Cardoso-Leite, Pedro [2 ]
Fischbach, Antoine [1 ]
机构
[1] Univ Luxembourg, Luxembourg Ctr Educ Testing, Esch Sur Alzette, Luxembourg
[2] Univ Luxembourg, Dept Behav & Cognit Sci, Esch Sur Alzette, Luxembourg
[3] Univ Potsdam, Dept Educ, Potsdam, Germany
来源
FRONTIERS IN PSYCHOLOGY | 2020年 / 11卷
关键词
value-added modeling; school effectiveness; machine learning; model comparison; longitudinal data; MODELS; ACCOUNTABILITY; PSYCHOLOGY; PROGRESS;
D O I
10.3389/fpsyg.2020.02190
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
There is no consensus on which statistical model estimates school value-added (VA) most accurately. To date, the two most common statistical models used for the calculation of VA scores are two classical methods: linear regression and multilevel models. These models have the advantage of being relatively transparent and thus understandable for most researchers and practitioners. However, these statistical models are bound to certain assumptions (e.g., linearity) that might limit their prediction accuracy. Machine learning methods, which have yielded spectacular results in numerous fields, may be a valuable alternative to these classical models. Although big data is not new in general, it is relatively new in the realm of social sciences and education. New types of data require new data analytical approaches. Such techniques have already evolved in fields with a long tradition in crunching big data (e.g., gene technology). The objective of the present paper is to competently apply these "imported" techniques to education data, more precisely VA scores, and assess when and how they can extend or replace the classical psychometrics toolbox. The different models include linear and non-linear methods and extend classical models with the most commonly used machine learning methods (i.e., random forest, neural networks, support vector machines, and boosting). We used representative data of 3,026 students in 153 schools who took part in the standardized achievement tests of the Luxembourg School Monitoring Program in grades 1 and 3. Multilevel models outperformed classical linear and polynomial regressions, as well as different machine learning models. However, it could be observed that across all schools, school VA scores from different model types correlated highly. Yet, the percentage of disagreements as compared to multilevel models was not trivial and real-life implications for individual schools may still be dramatic depending on the model type used. Implications of these results and possible ethical concerns regarding the use of machine learning methods for decision-making in education are discussed.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Large-scale data mining using genetics-based machine learning
    Bacardit, Jaume
    Llora, Xavier
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (01) : 37 - 61
  • [22] Humanization of antibodies using a machine learning approach on large-scale repertoire data
    Marks, Claire
    Hummer, Alissa M.
    Chin, Mark
    Deane, Charlotte M.
    BIOINFORMATICS, 2021, 37 (22) : 4041 - 4047
  • [23] ENHANCING INPUT PARAMETER ESTIMATION BY MACHINE LEARNING FOR THE SIMULATION OF LARGE-SCALE LOGISTICS NETWORKS
    Liu, Yang
    Yan, Liang
    Liu, Sheng
    Jiang, Ting
    Zhang, Feng
    Wang, Yu
    Wu, Shengnan
    2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 608 - 619
  • [24] Toward Large-Scale Riverine Phosphorus Estimation Using Remote Sensing and Machine Learning
    Ramtel, Pradeep
    Feng, Dongmei
    Gardner, John
    JOURNAL OF GEOPHYSICAL RESEARCH-BIOGEOSCIENCES, 2024, 129 (08)
  • [25] Rapid estimation of soil Mn content by machine learning and soil spectra in large-scale
    Zhou, Min
    Hu, Tao
    Wu, Mengting
    Ma, Chundi
    Qi, Chongchong
    ECOLOGICAL INFORMATICS, 2024, 81
  • [26] A Fast Machine Learning Model for Large-Scale Estimation of Annual Solar Irradiation on Rooftops
    Walch, Alina
    Castello, Roberto
    Mohajeri, Nahid
    Scartezzini, Jean-Louis
    PROCEEDINGS OF THE ISES SOLAR WORLD CONFERENCE 2019 AND THE IEA SHC SOLAR HEATING AND COOLING CONFERENCE FOR BUILDINGS AND INDUSTRY 2019, 2019, : 2201 - 2210
  • [27] Predicting file downloading time in cellular network: Large-Scale analysis of machine learning approaches
    Samba, Alassane
    Busnel, Yann
    Blanc, Alberto
    Dooze, Philippe
    Simon, Gwendal
    COMPUTER NETWORKS, 2018, 145 : 243 - 254
  • [28] Machine learning based survival prediction in Glioma using large-scale registry data
    Zhao, Rachel
    Zhuge, Ying
    Camphausen, Kevin
    Krauze, Andra, V
    HEALTH INFORMATICS JOURNAL, 2022, 28 (04)
  • [29] Analyzing large-scale human mobility data: a survey of machine learning methods and applications
    Toch, Eran
    Lerner, Boaz
    Ben-Zion, Eyal
    Ben-Gal, Irad
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 58 (03) : 501 - 523
  • [30] A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows
    Lee, Claire Songhyun
    Hewes, V.
    Cerati, Giuseppe
    Kowalkowski, Jim
    Aurisano, Adam
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 71 - 81