oFVSD: a Python']Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data

被引:2
|
作者
Dang, Tung [1 ,2 ]
Fermin, Alan S. R. [1 ]
Machizawa, Maro G. [1 ]
机构
[1] Hiroshima Univ, Ctr Brain Mind & KANSEI Sci Res, Hiroshima, Japan
[2] Univ Tokyo, Grad Sch Agr & Life Sci, Tokyo, Japan
关键词
machine learning; forward variable selection; optimized hyperparameter; neural decoding; MRI; VBM (voxel-based morphometry); RANDOM FOREST REGRESSION; DECISION TREE; PREDICTION; IMPACT; MODEL; FMRI; AGE;
D O I
10.3389/fninf.2023.1266713
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A variable selection package driving Netica with Python']Python
    Beuzen, Tomas
    Simmons, Joshua
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2019, 115 : 1 - 5
  • [2] HyperTools: a Python']Python Toolbox for Gaining Geometric Insights into High-Dimensional Data
    Heusser, Andrew C.
    Ziman, Kirsten
    Owen, Lucy L. W.
    Manning, Jeremy R.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
  • [3] Hi-LASSO: High-performance python']python and apache spark packages for feature selection with high-dimensional data
    Jo, Jongkwon
    Jung, Seungha
    Park, Joongyang
    Kim, Youngsoon
    Kang, Mingon
    [J]. PLOS ONE, 2022, 17 (12):
  • [4] refellips: A Python']Python package for the analysis of variable angle spectroscopic ellipsometry data
    Robertson, Hayden
    Gresham, Isaac J.
    Prescott, Stuart W.
    Webber, Grant B.
    Wanless, Erica J.
    Nelson, Andrew
    [J]. SOFTWAREX, 2022, 20
  • [5] PyDREAM: high-dimensional parameter inference for biological models in python']python
    Shockley, Erin M.
    Vrugt, Jasper A.
    Lopez, Carlos F.
    [J]. BIOINFORMATICS, 2018, 34 (04) : 695 - 697
  • [6] TorchDA: A Python']Python package for performing data assimilation with deep learning forward and transformation functions
    Cheng, Sibo
    Min, Jinyang
    Liu, Che
    Arcucci, Rossella
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2025, 306
  • [7] Variable selection for high-dimensional incomplete data
    Liang, Lixing
    Zhuang, Yipeng
    Yu, Philip L. H.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [8] High-Dimensional Variable Selection for Survival Data
    Ishwaran, Hemant
    Kogalur, Udaya B.
    Gorodeski, Eiran Z.
    Minn, Andy J.
    Lauer, Michael S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 205 - 217
  • [9] Springer: An R package for bi-level variable selection of high-dimensional longitudinal data
    Zhou, Fei
    Liu, Yuwen
    Ren, Jie
    Wang, Weiqun
    Wu, Cen
    [J]. FRONTIERS IN GENETICS, 2023, 14
  • [10] High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclus
    Nia, Vahid Partovi
    Davison, Anthony C.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2012, 47 (05): : 1 - 22