Contrasting Classical and Machine Learning Approaches in the Estimation of Value-Added Scores in Large-Scale Educational Data

被引:8
|
作者
Levy, Jessica [1 ]
Mussack, Dominic [2 ]
Brunner, Martin [3 ]
Keller, Ulrich [1 ]
Cardoso-Leite, Pedro [2 ]
Fischbach, Antoine [1 ]
机构
[1] Univ Luxembourg, Luxembourg Ctr Educ Testing, Esch Sur Alzette, Luxembourg
[2] Univ Luxembourg, Dept Behav & Cognit Sci, Esch Sur Alzette, Luxembourg
[3] Univ Potsdam, Dept Educ, Potsdam, Germany
来源
FRONTIERS IN PSYCHOLOGY | 2020年 / 11卷
关键词
value-added modeling; school effectiveness; machine learning; model comparison; longitudinal data; MODELS; ACCOUNTABILITY; PSYCHOLOGY; PROGRESS;
D O I
10.3389/fpsyg.2020.02190
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
There is no consensus on which statistical model estimates school value-added (VA) most accurately. To date, the two most common statistical models used for the calculation of VA scores are two classical methods: linear regression and multilevel models. These models have the advantage of being relatively transparent and thus understandable for most researchers and practitioners. However, these statistical models are bound to certain assumptions (e.g., linearity) that might limit their prediction accuracy. Machine learning methods, which have yielded spectacular results in numerous fields, may be a valuable alternative to these classical models. Although big data is not new in general, it is relatively new in the realm of social sciences and education. New types of data require new data analytical approaches. Such techniques have already evolved in fields with a long tradition in crunching big data (e.g., gene technology). The objective of the present paper is to competently apply these "imported" techniques to education data, more precisely VA scores, and assess when and how they can extend or replace the classical psychometrics toolbox. The different models include linear and non-linear methods and extend classical models with the most commonly used machine learning methods (i.e., random forest, neural networks, support vector machines, and boosting). We used representative data of 3,026 students in 153 schools who took part in the standardized achievement tests of the Luxembourg School Monitoring Program in grades 1 and 3. Multilevel models outperformed classical linear and polynomial regressions, as well as different machine learning models. However, it could be observed that across all schools, school VA scores from different model types correlated highly. Yet, the percentage of disagreements as compared to multilevel models was not trivial and real-life implications for individual schools may still be dramatic depending on the model type used. Implications of these results and possible ethical concerns regarding the use of machine learning methods for decision-making in education are discussed.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Comprehensive association between microRNA clusters and cancer: a machine learning study with large-scale data
    Yin, Mo
    Nojima, Masahiro
    CANCER SCIENCE, 2023, 114 : 1873 - 1873
  • [42] Defining disease endophenotypes in neovascular AMD by unsupervised machine learning of large-scale OCT data
    Seeboeck, Philipp
    Waldstein, Sebastian M.
    Donner, Rene
    Gerendas, Bianca S.
    Sadeghipour, Amir
    Osborne, Aaron
    Schmidt-Erfurth, Ursula
    Langs, Georg
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2017, 58 (08)
  • [43] Efficient Large-Scale Machine Learning Techniques for Rapid Motif Discovery in Energy Data Streams
    Lykothanasi, K. K.
    Sioutas, S.
    Tsichlas, K.
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2022, PART I, 2022, 646 : 331 - 342
  • [44] Actor-Based Incremental Tree Data Processing for Large-Scale Machine Learning Applications
    Sakurai, Kouhei
    Shimizu, Taiki
    PROCEEDINGS OF THE 9TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON PROGRAMMING BASED ON ACTORS, AGENTS, AND DECENTRALIZED CONTROL (AGERE '19), 2019, : 1 - 10
  • [45] Machine learning and geographic information systems for large-scale wind energy potential estimation in rural areas
    Assouline, Dan
    Mohajeri, Nahid
    Mauree, Dasaraden
    Scartezzini, Jean-Louis
    CLIMATE RESILIENT CITIES - ENERGY EFFICIENCY & RENEWABLES IN THE DIGITAL ERA (CISBAT 2019), 2019, 1343
  • [46] Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment
    Kondo M.
    IEEJ Transactions on Industry Applications, 2020, 140 (06): : 480 - 487
  • [47] Lifewide learning in the city: novel big data approaches to exploring learning with large-scale surveys, GPS, and social media
    Lido, Catherine
    Reid, Kate
    Osborne, Michael
    OXFORD REVIEW OF EDUCATION, 2019, 45 (02) : 279 - 295
  • [48] Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
    Claeys, Tine
    Menu, Maxime
    Bouwmeester, Robbin
    Gevaert, Kris
    Martens, Lennart
    JOURNAL OF PROTEOME RESEARCH, 2023, 22 (04) : 1181 - 1192
  • [49] A data-driven layout optimization framework of large-scale wind farms based on machine learning
    Yang, Kun
    Deng, Xiaowei
    Ti, Zilong
    Yang, Shanghui
    Huang, Senbin
    Wang, Yuhang
    RENEWABLE ENERGY, 2023, 218
  • [50] Assessing the Potential of UAV for Large-Scale Fractional Vegetation Cover Mapping with Satellite Data and Machine Learning
    Chen, Xunlong
    Sun, Yiming
    Qin, Xinyue
    Cai, Jianwei
    Cai, Minghui
    Hou, Xiaolong
    Yang, Kaijie
    Zhang, Houxi
    REMOTE SENSING, 2024, 16 (19)