当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer
Computational and Structural Biotechnology Journal ( IF 6 ) Pub Date : 2024-03-27 , DOI: 10.1016/j.csbj.2024.03.021
Yang Yang , Yuwei Lu , Zixuan Zheng , Hao Wu , Yuxin Lin , Fuliang Qian , Wenying Yan

Over the past decade, information for precision disease medicine has accumulated in the form of textual data. To effectively utilize this expanding medical text, we proposed a multi-task learning-based framework based on hard parameter sharing for knowledge graph construction (MKG), and then used it to automatically extract gastric cancer (GC)-related biomedical knowledge from the literature and identify GC drug candidates. In MKG, we designed three separate modules, MT-BGIPN, MT-SGTF and MT-ScBERT, for entity recognition, entity normalization, and relation classification, respectively. To address the challenges posed by the long and irregular naming of medical entities, the MT-BGIPN utilized bidirectional gated recurrent unit and interactive pointer network techniques, significantly improving entity recognition accuracy to an average F1 value of 84.5% across datasets. In MT-SGTF, we employed the term frequency-inverse document frequency and the gated attention unit. These combine both semantic and characteristic features of entities, resulting in an average Hits@ 1 score of 94.5% across five datasets. The MT-ScBERT integrated cross-text, entity, and context features, yielding an average F1 value of 86.9% across 11 relation classification datasets. Based on the MKG, we then developed a specific knowledge graph for GC (MKG-GC), which encompasses a total of 9129 entities and 88,482 triplets. Lastly, the MKG-GC was used to predict potential GC drugs using a pre-trained language model called BioKGE-BERT and a drug-disease discriminant model based on CNN-BiLSTM. Remarkably, nine out of the top ten predicted drugs have been previously reported as effective for gastric cancer treatment. Finally, an online platform was created for exploration and visualization of MKG-GC at .

中文翻译:

MKG-GC:基于多任务学习的知识图谱构建框架,个性化应用于胃癌

在过去的十年中,精准疾病医学的信息以文本数据的形式积累。为了有效地利用这种不断扩展的医学文本,我们提出了一种基于硬参数共享的多任务学习框架,用于知识图谱构建(MKG),然后用它从文献中自动提取与胃癌(GC)相关的生物医学知识并确定 GC 候选药物。在MKG中,我们设计了三个独立的模块:MT-BGIPN、MT-SGTF和MT-ScBERT,分别用于实体识别、实体标准化和关系分类。为了解决医疗实体命名过长且不规则所带来的挑战,MT-BGIPN 利用双向门控循环单元和交互式指针网络技术,显着提高了实体识别的准确性,跨数据集的平均 F1 值为 84.5%。在 MT-SGTF 中,我们采用了术语频率-逆文档频率和门控注意力单元。它们结合了实体的语义和特征特征,导致五个数据集的平均 Hits@1 得分为 94.5%。 MT-ScBERT 集成了跨文本、实体和上下文特征,在 11 个关系分类数据集上产生的平均 F1 值为 86.9%。基于MKG,我们开发了一个专门的GC知识图谱(MKG-GC),它总共包含9129个实体和88,482个三元组。最后,MKG-GC 使用预训练的 BioKGE-BERT 语言模型和基于 CNN-BiLSTM 的药物疾病判别模型来预测潜在的 GC 药物。值得注意的是,前十名预测药物中有九种此前已被报道对胃癌治疗有效。最后,创建了一个在线平台,用于 MKG-GC 的探索和可视化。
更新日期:2024-03-27
down
wechat
bug