当前位置: X-MOL 学术J. Med. Chem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Virtual Screening Strategies for Identifying Novel Chemotypes
Journal of Medicinal Chemistry ( IF 7.3 ) Pub Date : 2024-04-23 , DOI: 10.1021/acs.jmedchem.4c00906
Stuart Lang 1 , Martin J. Slater 1
Affiliation  

The identification of active compounds, or a series of compounds, is the crucial first step in all drug discovery projects. Ideally these compounds should display, or have the potential to display after optimization, good physical chemistry, and drug like properties. Screening, both physical and virtual, is a process that involves interrogating a protein target of interest with large numbers of compounds to determine credible starting points for drug discovery. A key factor that needs to be considered when defining a screening strategy is the knowledge of the specific protein target. Virtual screening requires a high level of knowledge on the protein target and/or ligands that bind to the target. A High Throughput Screen (HTS) does not require this level of knowledge but does require access to large quantities of the protein. In its broadest sense, virtual screening can be classified into two main categories: structure-based and ligand-based virtual screening, each requiring different information. Structure-based virtual screening involves docking large numbers of molecules into a protein. This requires a thorough understanding of the protein structure, such as identifying the key residues in the protein and the relevant contributions to ligand binding of these residues. The role of water molecules in the binding site and the stability of the conformation displayed is also critical for this type of virtual screen. A pharmacologically relevant 3D protein structure is essential for structure-based virtual screening. Contrarily, ligand-based virtual screening does not require a protein structure or even knowledge of the specific protein target, although this information is still useful if available. What is required for a successful ligand-based virtual screen is a template ligand that has been constructed in the correct bioactive conformation. For example, in the ligand-centric solution Blaze, one approach is to construct a template that is used for the screen by modeling the electrostatic environment around the ligand as field points. This is not necessarily the same conformation as a global minimum conformation, so preparative work is required to decipher the bioactive conformation. In instances where a protein structure is available this information can also be used in the screen, as excluded volume, to remove the risk of the compounds that are identified in the screen clashing with the protein surface. While virtual screening is primarily used to identify small molecule starting points, the template ligand does not need to be a small molecule. Small peptide substrates, natural protein binding partners (for example in protein–protein interactions) and contact points in crystallographic packing all give opportunities for template design for ligand-based virtual screening. Virtual screening increases the likelihood of finding an active molecule. It does not guarantee that a molecule that scores well is going to be active against a protein target of interest. Generally, it is expected that on average, greater than 1% of molecules identified in a virtual screen will be confirmed experimentally when evaluated against the target. While this may seem modest, when compared to the 0.01–0.1% hit rate that is expected from a nonbiased HTS, enrichment of the screening set using virtual screening offers a 10–100-fold increase in probability of a molecule that is evaluated being active. This means that experimental screening of 500–1000 molecules that have been selected from virtual screening would be equivalent to 50,000–100,000 molecules screened via HTS. As diversity of molecules that are progressed from the virtual screen is key to success, the post screen triage is very important. Removal of compounds that do not display acceptable physical chemistry properties, based on parameters such as logP/D, TPSA, HBD/HBA etc.. is essential. An example of metrics that are used for scoring a virtual screen are 3D shape similarity to the ligand template, along with docking and Electrostatic Complementarity (EC) scoring, the latter two require a protein structure to be available. As each of these metrics score molecules differently, it is important to consider each independently when selecting compounds. A compound could score better using different methods and, at this stage, it is unknown which metric is more likely to correlate to activity. Finally, as per Journal of Medicinal Chemistry author submission guidelines, virtual screening lists should be filtered to remove PAINS and other reactive groups. Clustering is a crucial step in ensuring diversity in the post-triaged set. By grouping compounds based on their structure it is possible to ensure that the final set is not dominated by a particular scaffold that scores well using the prioritization metrics that may not transfer to measured activity. Another advantage of clustering the set prior to experimental validation is that these clusters can be mined to generate Structure Activity Relationship (SAR), as untested compounds from an active cluster can be subsequently screened after the cluster is known to contain hit compounds. Virtual screening compound collections need to be readily available. One method of ensuring that compounds are available for experimental validation is to limit the virtual screening collection to commercially available compounds, this parameter allows for around 23 million compounds to be routinely screened. However, for organizations who possess in-house HTS collections, perhaps containing proprietary compounds, these collections can also be used for virtual screening to select a subset for screening against a target. This may be beneficial in cases where it is difficult to prepare enough protein for a full HTS campaign, or to identify molecules that may have been missed in the initial HTS. Recently, due to the development of compound sets such as EnamineREAL, which contains virtual compounds that are readily synthesizable from available building blocks, off-the-shelf availability of compounds for virtual screening has become less of a limitation. However, increasing the screening set from around 23 million to the 6 billion compounds presents a different challenge–scale of virtual screening. For this reason, a synthon-based approach, an example of which is demonstrated in Ignite, can be applied. This involves screening each of the building blocks used to create the full set against the target protein and only generating the final molecule for those whose synthons scored highly enough. Then each of these final molecules undergo a full screen, like that used in the previously described processes to generate a virtual screening longlist that can subsequently be triaged to the 500–1000 compounds that are recommended for testing. Virtual screening can allow novel chemotypes against a protein target of interest to be identified without the significant upfront investment required to curate and store an HTS collection. While it still requires the testing of a significant selection of compounds, it does not require the use of automated liquid handling/pipetting systems that are necessary to process an HTS. This makes virtual screening an accessible and attractive prospect to drug discovery organizations of all sizes. The upfront intellectual information generated prior to the screen coupled with the postscreen triage can also, if conducted effectively, allow for efficiencies in the early hit-to-lead phases of the project. This article has not yet been cited by other publications.

中文翻译:

识别新化学型的虚拟筛选策略

活性化合物或一系列化合物的鉴定是所有药物发现项目中至关重要的第一步。理想情况下,这些化合物应表现出或在优化后有潜力表现出良好的物理化学和药物样特性。物理和虚拟筛选是一个过程,涉及用大量化合物询问感兴趣的蛋白质靶点,以确定药物发现的可靠起点。定义筛选策略时需要考虑的一个关键因素是对特定蛋白质靶标的了解。虚拟筛选需要对蛋白质靶点和/或与靶点结合的配体有高水平的了解。高通量筛选 (HTS) 不需要这种水平的知识,但确实需要获取大量蛋白质。从最广泛的意义上讲,虚拟筛选可分为两大类:基于结构的虚拟筛选和基于配体的虚拟筛选,每种都需要不同的信息。基于结构的虚拟筛选涉及将大量分子对接到蛋白质中。这需要对蛋白质结构有透彻的了解,例如识别蛋白质中的关键残基以及这些残基对配体结合的相关贡献。水分子在结合位点中的作用以及所显示构象的稳定性对于此类虚拟屏幕也至关重要。药理学相关的 3D 蛋白质结构对于基于结构的虚拟筛选至关重要。相反,基于配体的虚拟筛选不需要蛋白质结构,甚至不需要特定蛋白质靶标的知识,尽管这些信息如果可用的话仍然有用。成功的基于配体的虚拟筛选需要的是以正确的生物活性构象构建的模板配体。例如,在以配体为中心的解决方案 Blaze 中,一种方法是通过将配体周围的静电环境建模为场点来构建用于屏幕的模板。这不一定与全局最小构象相同,因此需要进行准备工作来破译生物活性构象。在蛋白质结构可用的情况下,该信息也可以在筛选中用作排除体积,以消除筛选中识别的化合物与蛋白质表面冲突的风险。虽然虚拟筛选主要用于识别小分子起始点,但模板配体不需要是小分子。小肽底物、天然蛋白质结合伴侣(例如在蛋白质-蛋白质相互作用中)和晶体堆积中的接触点都为基于配体的虚拟筛选的模板设计提供了机会。虚拟筛选增加了发现活性分子的可能性。它并不能保证得分良好的分子能够有效对抗感兴趣的蛋白质靶标。一般来说,预计在针对目标进行评估时,平均超过 1% 的虚拟屏幕中识别的分子将通过实验得到确认。虽然这看起来可能不大,但与无偏 HTS 预期的 0.01-0.1% 命中率相比,使用虚拟筛选丰富筛选集可将被评估为活性分子的概率增加 10-100 倍。这意味着对虚拟筛选中选出的 500-1000 个分子进行实验筛选相当于通过 HTS 筛选了 50,000-100,000 个分子。由于从虚拟屏幕中获得的分子的多样性是成功的关键,因此屏幕后分类非常重要。根据 logP/D、TPSA、HBD/HBA 等参数,去除不具有可接受的物理化学性质的化合物至关重要。用于对虚拟屏幕进行评分的指标的一个例子是与配体模板的 3D 形状相似性,以及对接和静电互补 (EC) 评分,后两者需要可用的蛋白质结构。由于每个指标对分子的评分不同,因此在选择化合物时独立考虑每个指标非常重要。使用不同的方法,化合物可以得到更好的分数,但在现阶段,尚不清楚哪种指标更有可能与活性相关。最后,根据药物化学杂志作者提交指南,虚拟筛选列表应进行过滤,以删除 PAINS 和其他反应基团。聚类是确保分类后集合多样性的关键步骤。通过根据化合物的结构对化合物进行分组,可以确保最终组不会受到使用可能不会转移到测量活性的优先级指标得分良好的特定支架的支配。在实验验证之前对集合进行聚类的另一个优点是,可以挖掘这些簇以生成结构活性关系 (SAR),因为在已知簇包含命中化合物后,可以随后筛选来自活性簇的​​未测试化合物。虚拟筛选化合物集合需要随时可用。确保化合物可用于实验验证的一种方法是将虚拟筛选集合限制为市售化合物,此参数允许对大约 2300 万种化合物进行常规筛选。然而,对于拥有内部 HTS 集合(可能包含专有化合物)的组织来说,这些集合也可用于虚拟筛选,以选择一个子集来针对目标进行筛选。当难以为完整的 HTS 活动准备足够的蛋白质或难以识别初始 HTS 中可能遗漏的分子时,这可能是有益的。最近,由于 EnamineREAL 等化合物组的开发,其中包含可以从可用构建模块轻松合成的虚拟化合物,用于虚拟筛选的现成化合物的可用性已不再受到限制。然而,将筛选集从约 2300 万种化合物增加到 60 亿种化合物提出了不同的挑战——虚拟筛选的规模。因此,可以应用基于合成子的方法(在 Ignite 中演示了该方法的一个示例)。这涉及筛选用于创建针对目标蛋白质的全套的每个构建块,并且仅为那些合成子得分足够高的分子生成最终分子。然后,每个最终分子都经过全面筛选,就像前面描述的过程中使用的那样,生成虚拟筛选长名单,随后可以将其分类为建议测试的 500-1000 种化合物。虚拟筛选可以识别针对感兴趣的蛋白质靶标的新化学型,而无需策划和存储 HTS 集合所需的大量前期投资。虽然它仍然需要测试大量选择的化合物,但它不需要使用处理 HTS 所需的自动液体处理/移液系统。这使得虚拟筛选对于各种规模的药物发现组织来说都成为一种易于使用且有吸引力的前景。如果有效执行,放映前生成的前期智力信息与放映后分类相结合,也可以提高项目早期的“从成功到领先”阶段的效率。这篇文章尚未被其他出版物引用。
更新日期:2024-04-24
down
wechat
bug