Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models,RSC Medicinal Chemistry

当前位置： X-MOL 学术 › RSC Med. Chem. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models
RSC Medicinal Chemistry ( IF 4.1 ) Pub Date : 2024-04-08 , DOI: 10.1039/d4md00128a
Alec Lamens ₁ , Jürgen Bajorath _{1,

2}

Affiliation

Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.

中文翻译：

使用多任务模型系统生成和分析复合活动预测的反事实

大多数机器学习 (ML) 方法产生的预测很难或无法理解。预测模型的黑匣子性质掩盖了潜在的学习偏差，使得识别和追踪问题变得困难。此外，无法合理化模型决策导致不愿接受实验设计的预测。对于机器学习来说，对预测的有限信任是一个重大问题，并继续限制其在跨学科研究（包括早期药物发现）中的影响。作为一种理想的补救措施，可解释人工智能 (XAI) 的方法越来越多地应用于揭示机器学习黑匣子并帮助合理化预测。其中之一是反事实（CF）的概念，最好将其理解为经过小修改而产生相反预测结果的测试用例（例如对象分类中的不同类标签）。对于药物化学中的 ML 应用（例如，化合物活性预测），CF 特别直观，因为这些假设分子可以立即与实际测试化合物进行比较，而无需专业的 ML 知识，并且可供执业化学家使用。这种比较通常会揭示化合物中的结构部分，从而决定其预测并可以进一步研究。在此，我们将最近引入的系统生成分子 CF 的概念应用于不同类别蛋白激酶抑制剂的多任务预测，详细分析 CF，合理化多任务建模中 CF 形成的起源，并提出示例性模型对预测的解释。

更新日期：2024-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>