当前位置: X-MOL 学术J. Electr. Syst. Inf Technol › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning framework with feature selection approaches for thyroid disease classification and associated risk factors identification
Journal of Electrical Systems and Information Technology Pub Date : 2023-06-16 , DOI: 10.1186/s43067-023-00101-5
Azrin Sultana , Rakibul Islam

Thyroid disease (TD) develops when the thyroid does not generate an adequate quantity of thyroid hormones as well as when a lump or nodule emerges due to aberrant growth of the thyroid gland. As a result, early detection was pertinent in preventing or minimizing the impact of this disease. In this study, different machine learning (ML) algorithms with a combination of scaling method, oversampling technique, and various feature selection approaches have been applied to make an efficient framework to classify TD. In addition, significant risk factors of TD were also identified in this proposed system. The dataset was collected from the University of California Irvine (UCI) repository for this research. After that, in the preprocessing stage, Synthetic Minority Oversampling Technique (SMOTE) was used to resolve the imbalance class problem and robust scaling technique was used to scale the dataset. The Boruta, Recursive Feature Elimination (RFE), and Least Absolute Shrinkage and Selection Operator (LASSO) approaches were used to select appropriate features. To train the model, we employed six different ML classifiers: Support Vector Machine (SVM), AdaBoost (AB), Decision Tree (DT), Gradient Boosting (GB), K-Nearest Neighbors (KNN), and Random Forest (RF). The models were examined using a 5-fold CV. Different performance metrics were observed to compare the effectiveness of the algorithms. The system achieved the most accurate results using the RF classifier, with 99% accuracy. This proposed system will be beneficial for physicians and patients to classify TD as well as to learn about the associated risk factors of TD.

中文翻译:

具有甲状腺疾病分类和相关风险因素识别特征选择方法的机器学习框架

当甲状腺不能产生足够量的甲状腺激素以及由于甲状腺异常生长而出现肿块或结节时,就会发生甲状腺疾病 (TD)。因此,及早发现对于预防或尽量减少这种疾病的影响非常重要。在这项研究中,不同的机器学习 (ML) 算法结合了缩放方法、过采样技术和各种特征选择方法,已被应用于构建一个有效的框架来对 TD​​ 进行分类。此外,在该拟议系统中还确定了 TD 的重要风险因素。数据集是从加州大学尔湾分校 (UCI) 存储库收集的,用于这项研究。之后,在预处理阶段,使用合成少数过采样技术 (SMOTE) 来解决不平衡类问题,并使用稳健缩放技术来缩放数据集。Boruta、递归特征消除 (RFE) 和最小绝对收缩和选择算子 (LASSO) 方法用于选择适当的特征。为了训练模型,我们采用了六种不同的 ML 分类器:支持向量机 (SVM)、AdaBoost (AB)、决策树 (DT)、梯度提升 (GB)、K 最近邻 (KNN) 和随机森林 (RF) . 使用 5 倍 CV 检查模型。观察不同的性能指标以比较算法的有效性。该系统使用 RF 分类器获得了最准确的结果,准确率为 99%。
更新日期:2023-06-19
down
wechat
bug