当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Overlapping group screening for binary cancer classification with TCGA high-dimensional genomic data
Journal of Bioinformatics and Computational Biology ( IF 1 ) Pub Date : 2023-06-22 , DOI: 10.1142/s0219720023500130
Jie-Huei Wang , Yi-Hau Chen

Precision medicine has been a global trend of medical development, wherein cancer diagnosis plays an important role. With accurate diagnosis of cancer, we can provide patients with appropriate medical treatments for improving patients’ survival. Since disease developments involve complex interplay among multiple factors such as gene–gene interactions, cancer classifications based on microarray gene expression profiling data are expected to be effective, and hence, have attracted extensive attention in computational biology and medicine. However, when using genomic data to build a diagnostic model, there exist several problems to be overcome, including the high-dimensional feature space and feature contamination. In this paper, we propose using the overlapping group screening (OGS) approach to build an accurate cancer diagnosis model and predict the probability of a patient falling into some disease classification category in the logistic regression framework. This new proposal integrates gene pathway information into the procedure for identifying genes and gene–gene interactions associated with the classification of cancer outcome groups. We conduct a series of simulation studies to compare the predictive accuracy of our proposed method for cancer diagnosis with some existing machine learning methods, and find the better performances of the former method. We apply the proposed method to the genomic data of The Cancer Genome Atlas related to lung adenocarcinoma (LUAD), liver hepatocellular carcinoma (LHC), and thyroid carcinoma (THCA), to establish accurate cancer diagnosis models.



中文翻译:

使用 TCGA 高维基因组数据进行二元癌症分类的重叠组筛选

精准医疗已成为全球医学发展的趋势,其中癌症诊断发挥着重要作用。通过对癌症的准确诊断,我们可以为患者提供适当的治疗,以提高患者的生存率。由于疾病的发展涉及基因与基因相互作用等多种因素之间复杂的相互作用,基于微阵列基因表达谱数据的癌症分类有望是有效的,因此引起了计算生物学和医学的广泛关注。然而,在使用基因组数据构建诊断模型时,存在一些需要克服的问题,包括高维特征空间和特征污染。在本文中,我们建议使用重叠组筛查(OGS)方法来构建准确的癌症诊断模型,并预测患者在逻辑回归框架中落入某些疾病分类类别的概率。这项新提案将基因通路信息整合到识别与癌症结果组分类相关的基因和基因间相互作用的程序中。我们进行了一系列模拟研究,将我们提出的癌症诊断方法与一些现有的机器学习方法的预测准确性进行比较,并发现前一种方法的性能更好。我们将所提出的方法应用于与肺腺癌(LUAD)、肝细胞癌(LHC)和甲状腺癌(THCA)相关的癌症基因组图谱的基因组数据,

更新日期:2023-06-22
down
wechat
bug