oFVSD: a Python']Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data

被引:2
|
作者
Dang, Tung [1 ,2 ]
Fermin, Alan S. R. [1 ]
Machizawa, Maro G. [1 ]
机构
[1] Hiroshima Univ, Ctr Brain Mind & KANSEI Sci Res, Hiroshima, Japan
[2] Univ Tokyo, Grad Sch Agr & Life Sci, Tokyo, Japan
关键词
machine learning; forward variable selection; optimized hyperparameter; neural decoding; MRI; VBM (voxel-based morphometry); RANDOM FOREST REGRESSION; DECISION TREE; PREDICTION; IMPACT; MODEL; FMRI; AGE;
D O I
10.3389/fninf.2023.1266713
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models
    Xie, Yanxi
    Li, Yuewen
    Xia, Zhijie
    Yan, Ruixia
    [J]. IEEE ACCESS, 2020, 8 (08): : 129032 - 129042
  • [42] Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data
    Zhao, Yize
    Kang, Jian
    Long, Qi
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (02) : 537 - 550
  • [43] NOMspectra: An Open-Source Python']Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter
    Volikov, Alexander
    Rukhovich, Gleb
    Perminova, Irina V.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2023, 34 (07) : 1524 - 1527
  • [44] Variable selection and estimation in high-dimensional models
    Horowitz, Joel L.
    [J]. CANADIAN JOURNAL OF ECONOMICS-REVUE CANADIENNE D ECONOMIQUE, 2015, 48 (02): : 389 - 407
  • [45] High-dimensional graphs and variable selection with the Lasso
    Meinshausen, Nicolai
    Buehlmann, Peter
    [J]. ANNALS OF STATISTICS, 2006, 34 (03): : 1436 - 1462
  • [46] BayesSUR: An R Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression
    Zhao, Zhi
    Banterle, Marco
    Bottolo, Leonardo
    Richardson, Sylvia
    Lewin, Alex
    Zucknick, Manuela
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2021, 100 (11): : 1 - 32
  • [47] Variable selection for high-dimensional incomplete data using horseshoe estimation with data augmentation
    Zhang, Yunxi
    Kim, Soeun
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (12) : 4235 - 4251
  • [48] High-dimensional multivariate mediation with application to neuroimaging data
    Chen, Oliver Y.
    Crainiceanu, Ciprian
    Ogburn, Elizabeth L.
    Caffo, Brian S.
    Wager, Tor D.
    Lindquist, Martin A.
    [J]. BIOSTATISTICS, 2018, 19 (02) : 121 - 136
  • [49] Model Selection for High-Dimensional Data
    Owrang, Arash
    Jansson, Magnus
    [J]. 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2016, : 606 - 609
  • [50] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75