Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets

被引:0
|
作者
Pradyumna Lanka
D Rangaprakash
Michael N. Dretsch
Jeffrey S. Katz
Thomas S. Denney
Gopikrishna Deshpande
机构
[1] Auburn University,AU MRI Research Center, Department of Electrical and Computer Engineering
[2] University of California Merced,Department of Psychological Sciences
[3] Northwestern University,Departments of Radiology and Biomedical Engineering
[4] U.S. Army Aeromedical Research Laboratory,US Army Medical Research Directorate
[5] Walter Reed Army Institute for Research,West
[6] Auburn University,Department of Psychology
[7] Alabama Advanced Imaging Consortium,Center for Neuroscience
[8] Auburn University,Center for Health Ecology and Equity Research
[9] Auburn University,Department of Psychiatry
[10] National Institute of Mental and Neurosciences,undefined
来源
关键词
Resting-state functional MRI; Supervised machine learning; Diagnostic classification; Functional connectivity; Autism; ADHD; Alzheimer’s disease; PTSD;
D O I
暂无
中图分类号
学科分类号
摘要
There are growing concerns about the generalizability of machine learning classifiers in neuroimaging. In order to evaluate this aspect across relatively large heterogeneous populations, we investigated four disorders: Autism spectrum disorder (N = 988), Attention deficit hyperactivity disorder (N = 930), Post-traumatic stress disorder (N = 87) and Alzheimer’s disease (N = 132). We applied 18 different machine learning classifiers (based on diverse principles) wherein the training/validation and the hold-out test data belonged to samples with the same diagnosis but differing in either the age range or the acquisition site. Our results indicate that overfitting can be a huge problem in heterogeneous datasets, especially with fewer samples, leading to inflated measures of accuracy that fail to generalize well to the general clinical population. Further, different classifiers tended to perform well on different datasets. In order to address this, we propose a consensus-classifier by combining the predictive power of all 18 classifiers. The consensus-classifier was less sensitive to unmatched training/validation and holdout test data. Finally, we combined feature importance scores obtained from all classifiers to infer the discriminative ability of connectivity features. The functional connectivity patterns thus identified were robust to the classification algorithm used, age and acquisition site differences, and had diagnostic predictive ability in addition to univariate statistically significant group differences between the groups. A MATLAB toolbox called Machine Learning in NeuroImaging (MALINI), which implements all the 18 different classifiers along with the consensus classifier is available from Lanka et al. (2019) The toolbox can also be found at the following URL: https://github.com/pradlanka/malini.
引用
收藏
页码:2378 / 2416
页数:38
相关论文
共 50 条
  • [1] Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets
    Lanka, Pradyumna
    Rangaprakash, D.
    Dretsch, Michael N.
    Katz, Jeffrey S.
    Denney, Thomas S., Jr.
    Deshpande, Gopikrishna
    [J]. BRAIN IMAGING AND BEHAVIOR, 2020, 14 (06) : 2378 - 2416
  • [2] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    [J]. BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
  • [3] Quick extreme learning machine for large-scale classification
    Audi Albtoush
    Manuel Fernández-Delgado
    Eva Cernadas
    Senén Barro
    [J]. Neural Computing and Applications, 2022, 34 : 5923 - 5938
  • [4] Large-scale machine learning for metagenomics sequence classification
    Vervier, Kevin
    Mahe, Pierre
    Tournoud, Maud
    Veyrieras, Jean-Baptiste
    Vert, Jean-Philippe
    [J]. BIOINFORMATICS, 2016, 32 (07) : 1023 - 1032
  • [5] Quick extreme learning machine for large-scale classification
    Albtoush, Audi
    Fernandez-Delgado, Manuel
    Cernadas, Eva
    Barro, Senen
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08): : 5923 - 5938
  • [6] Learning to Index in Large-Scale Datasets
    Prayoonwong, Amorntip
    Wang, Cheng-Hsien
    Chiu, Chih-Yi
    [J]. MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 305 - 316
  • [7] Iterative Classification for Sanitizing Large-Scale Datasets
    Li, Bo
    Vorobeychik, Yevgeniy
    Li, Muqun
    Malin, Bradley
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 841 - 846
  • [8] Learning Bayesian Network Structure from Large-scale Datasets
    Hong, Yu
    Xia, Xiaoling
    Le, Jiajin
    Zhou, Xiangdong
    [J]. 2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 258 - 264
  • [9] Learning From Noisy Large-Scale Datasets With Minimal Supervision
    Veit, Andreas
    Alldrin, Neil
    Chechik, Gal
    Krasin, Ivan
    Gupta, Abhinav
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6575 - 6583
  • [10] Using supervised machine learning for large-scale classification in management research: The case for identifying artificial intelligence patents
    Miric, Milan
    Jia, Nan
    Huang, Kenneth G.
    [J]. STRATEGIC MANAGEMENT JOURNAL, 2023, 44 (02) : 491 - 519