当前位置: X-MOL 学术Nat. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computational scoring and experimental evaluation of enzymes generated by neural networks
Nature Biotechnology ( IF 46.9 ) Pub Date : 2024-04-23 , DOI: 10.1038/s41587-024-02214-2
Sean R. Johnson , Xiaozhi Fu , Sandra Viknander , Clara Goldin , Sarah Monaco , Aleksej Zelezniak , Kevin K. Yang

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70–90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50–150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.



中文翻译:

神经网络产生的酶的计算评分和实验评估

近年来,已经开发了生成蛋白质序列模型来对新序列进行采样。然而,预测生成的蛋白质是否会折叠和发挥作用仍然具有挑战性。我们评估了一组 20 种不同的计算指标,以评估三种对比生成模型产生的酶序列的质量:祖先序列重建、生成对抗网络和蛋白质语言模型。我们重点关注两个酶家族,表达并纯化了 500 多个天然和生成的序列,与最相似的天然序列具有 70-90% 的同一性,以作为预测体外酶活性的基准计算指标。经过三轮实验,我们开发了一种计算过滤器,将实验成功率提高了 50-150%。提出的指标和模型将作为生成蛋白质序列模型的基准并帮助选择用于实验测试的活性变体,从而推动蛋白质工程研究。

更新日期:2024-04-23
down
wechat
bug