当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improvement of multi-task learning by data enrichment: application for drug discovery
Journal of Computer-Aided Molecular Design ( IF 3.5 ) Pub Date : 2023-03-21 , DOI: 10.1007/s10822-023-00500-w
Ekaterina A Sosnina 1 , Sergey Sosnin 2 , Maxim V Fedorov 1, 3
Affiliation  

Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.



中文翻译:

通过数据丰富改进多任务学习:在药物发现中的应用

深度神经网络中的多任务学习已成为包括药物发现在内的许多研究领域中越来越重要的话题。然而,应用多任务学习对提高预测性能提出了新的挑战。本研究调查了训练数据丰富在提高药物发现中的多任务模型预测质量方面的潜力。该研究评估了四种具有不同程度信息容量的训练数据场景,并应用两种类型的测试数据来评估预测性能。我们使用了三个数据集:ViralChEMBL,它由化合物对病毒物种的二元活性组成,用于分类任务;pQSAR(159) 和 pQSAR(4267) 由化合物的生物活性和 profile-QSAR 方法研究的测定组成,用于回归任务。我们使用 PyTorch 框架基于前馈 DNN 构建了多任务模型。我们的研究结果表明,训练数据丰富可能是提高多任务学习预测性能的有效手段,但改善程度取决于训练数据的质量。训练数据中包含的独特化合物和靶标越多,预测改进所需的新化合物-靶标相互作用就越多。此外,我们发现即使使用多任务学习,也无法预测与用于模型训练的化合物高度不同的化合物的相互作用。该研究为在药物发现中有效采用多任务学习以提高预测准确性并促进新型候选药物的发现提供了一些建议。

更新日期:2023-03-21
down
wechat
bug