当前位置: X-MOL 学术Front. Neuroinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
Frontiers in Neuroinformatics ( IF 3.5 ) Pub Date : 2023-09-27 , DOI: 10.3389/fninf.2023.1266713
Tung Dang 1, 2 , Alan S R Fermin 1 , Maro G Machizawa 1
Affiliation  

The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.

中文翻译:

oFVSD:用于高维神经影像数据的优化前向变量选择解码器的 Python 包

神经影像数据的复杂性和高维性给机器学习 (ML) 模型解码信息带来了问题,因为特征数量通常远大于观察数量。特征选择是解码中确定有意义的目标特征的关键步骤之一;然而,使用传统的机器学习模型从此类高维神经影像数据中优化特征选择一直具有挑战性。在这里,我们介绍了一种高效、高性能的解码包,结合了前向变量选择(FVS)算法和超参数优化,可以自动识别分类和回归模型的最佳特征对,其中总共实现了 18 个 ML 模型默认。首先,FVS 算法使用 k 折交叉验证步骤评估不同模型的拟合优度,该步骤根据每个模型的预定义标准识别最佳特征子集。接下来,每个机器学习模型的超参数在每次前向迭代中进行优化。最终输出突出显示了每个模型的优化数量的选定特征(感兴趣的大脑区域)及其准确性。此外,该工具箱可以在并行环境中执行,以便在典型的个人计算机上进行高效计算。通过优化的前向变量选择解码器 (oFVSD) 流程,我们在 1,113 个结构磁共振成像 (MRI) 数据集上验证了解码性别分类和年龄范围回归的有效性。与不带 FVS 算法且使用 Boruta 算法作为变量选择对​​应项的 ML 模型相比,我们证明了 oFVSD 在所有 ML 模型中均显着优于不带 FVS 的对应模型(相关系数增加约 0.20,r,使用回归模型和分类模型平均提高 8%)以及使用 Boruta 变量选择算法(回归模型平均提高约 0.07,分类模型平均提高 4%)。此外,我们证实并行计算的使用大大减少了高维 MRI 数据的计算负担。总而言之,oFVSD 工具箱高效且有效地提高了分类和回归 ML 模型的性能,提供了 MRI 数据集的用例示例。凭借其灵活性,oFVSD 具有用于神经影像学许多其他模式的潜力。这个开源且免费提供的 Python 包使其成为寻求提高解码准确性的研究社区的宝贵工具箱。
更新日期:2023-09-27
down
wechat
bug