Contrasting Classical and Machine Learning Approaches in the Estimation of Value-Added Scores in Large-Scale Educational Data

被引：8

作者：

Levy, Jessica ^{[1
]}

Mussack, Dominic ^{[2
]}

Brunner, Martin ^{[3
]}

Keller, Ulrich ^{[1
]}

Cardoso-Leite, Pedro ^{[2
]}

Fischbach, Antoine ^{[1
]}

机构：

[1] Univ Luxembourg, Luxembourg Ctr Educ Testing, Esch Sur Alzette, Luxembourg

[2] Univ Luxembourg, Dept Behav & Cognit Sci, Esch Sur Alzette, Luxembourg

[3] Univ Potsdam, Dept Educ, Potsdam, Germany

来源：

FRONTIERS IN PSYCHOLOGY | 2020年 / 11卷

关键词：

value-added modeling; school effectiveness; machine learning; model comparison; longitudinal data; MODELS; ACCOUNTABILITY; PSYCHOLOGY; PROGRESS;

D O I：

10.3389/fpsyg.2020.02190

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

There is no consensus on which statistical model estimates school value-added (VA) most accurately. To date, the two most common statistical models used for the calculation of VA scores are two classical methods: linear regression and multilevel models. These models have the advantage of being relatively transparent and thus understandable for most researchers and practitioners. However, these statistical models are bound to certain assumptions (e.g., linearity) that might limit their prediction accuracy. Machine learning methods, which have yielded spectacular results in numerous fields, may be a valuable alternative to these classical models. Although big data is not new in general, it is relatively new in the realm of social sciences and education. New types of data require new data analytical approaches. Such techniques have already evolved in fields with a long tradition in crunching big data (e.g., gene technology). The objective of the present paper is to competently apply these "imported" techniques to education data, more precisely VA scores, and assess when and how they can extend or replace the classical psychometrics toolbox. The different models include linear and non-linear methods and extend classical models with the most commonly used machine learning methods (i.e., random forest, neural networks, support vector machines, and boosting). We used representative data of 3,026 students in 153 schools who took part in the standardized achievement tests of the Luxembourg School Monitoring Program in grades 1 and 3. Multilevel models outperformed classical linear and polynomial regressions, as well as different machine learning models. However, it could be observed that across all schools, school VA scores from different model types correlated highly. Yet, the percentage of disagreements as compared to multilevel models was not trivial and real-life implications for individual schools may still be dramatic depending on the model type used. Implications of these results and possible ethical concerns regarding the use of machine learning methods for decision-making in education are discussed.

引用

页数：18

共 50 条

[21] Large-scale data mining using genetics-based machine learning
Bacardit, Jaume
Llora, Xavier
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (01) : 37 - 61
[22] Humanization of antibodies using a machine learning approach on large-scale repertoire data
Marks, Claire
Hummer, Alissa M.
Chin, Mark
Deane, Charlotte M.
BIOINFORMATICS, 2021, 37 (22) : 4041 - 4047
[23] ENHANCING INPUT PARAMETER ESTIMATION BY MACHINE LEARNING FOR THE SIMULATION OF LARGE-SCALE LOGISTICS NETWORKS
Liu, Yang
Yan, Liang
Liu, Sheng
Jiang, Ting
Zhang, Feng
Wang, Yu
Wu, Shengnan
2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 608 - 619
[24] Toward Large-Scale Riverine Phosphorus Estimation Using Remote Sensing and Machine Learning
Ramtel, Pradeep
Feng, Dongmei
Gardner, John
JOURNAL OF GEOPHYSICAL RESEARCH-BIOGEOSCIENCES, 2024, 129 (08)
[25] Rapid estimation of soil Mn content by machine learning and soil spectra in large-scale
Zhou, Min
Hu, Tao
Wu, Mengting
Ma, Chundi
Qi, Chongchong
ECOLOGICAL INFORMATICS, 2024, 81
[26] A Fast Machine Learning Model for Large-Scale Estimation of Annual Solar Irradiation on Rooftops
Walch, Alina
Castello, Roberto
Mohajeri, Nahid
Scartezzini, Jean-Louis
PROCEEDINGS OF THE ISES SOLAR WORLD CONFERENCE 2019 AND THE IEA SHC SOLAR HEATING AND COOLING CONFERENCE FOR BUILDINGS AND INDUSTRY 2019, 2019, : 2201 - 2210
[27] Predicting file downloading time in cellular network: Large-Scale analysis of machine learning approaches
Samba, Alassane
Busnel, Yann
Blanc, Alberto
Dooze, Philippe
Simon, Gwendal
COMPUTER NETWORKS, 2018, 145 : 243 - 254
[28] Machine learning based survival prediction in Glioma using large-scale registry data
Zhao, Rachel
Zhuge, Ying
Camphausen, Kevin
Krauze, Andra, V
HEALTH INFORMATICS JOURNAL, 2022, 28 (04)
[29] Analyzing large-scale human mobility data: a survey of machine learning methods and applications
Toch, Eran
Lerner, Boaz
Ben-Zion, Eyal
Ben-Gal, Irad
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 58 (03) : 501 - 523
[30] A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows
Lee, Claire Songhyun
Hewes, V.
Cerati, Giuseppe
Kowalkowski, Jim
Aurisano, Adam
Agrawal, Ankit
Choudhary, Alok
Liao, Wei-keng
2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 71 - 81

← 1 2 3 4 5 →