当前位置: X-MOL 学术Cell Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate top protein variant discovery via low-N pick-and-validate machine learning
Cell Systems ( IF 9.3 ) Pub Date : 2024-02-09 , DOI: 10.1016/j.cels.2024.01.002
Hoi Yee Chu , John H.C. Fong , Dawn G.L. Thean , Peng Zhou , Frederic K.C. Fung , Yuanhua Huang , Alan S.L. Wong

A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper’s transparent peer review process is included in the supplemental information.

中文翻译:

通过低 N 挑选和验证机器学习准确发现顶级蛋白质变体

在巨大的组合突变环境中以最少的实验努力获得最多数量的最佳性能变体的策略对于提高蛋白质工程的资源生产力具有巨大的效用。为了实现这一目标,我们提出了一种简单而有效的基于机器学习的策略,其性能优于其他最先进的方法。我们的策略集成了零样本预测和多轮采样,通过仅试验几个预测的顶级变体来指导主动学习。我们发现,机器学习中对 12 个变体进行四轮低 N 挑选和验证采样,在选择组合突变体库中真正的前 1% 变体时,准确率高达 92.6%,而两轮 24 个变体则可以也可以使用。我们展示了我们成功发现来自不同家族的高性能蛋白质变体(包括基于 CRISPR 的基因组编辑器)的策略,支持其在解决蛋白质工程任务方面的普遍应用。补充信息中包含了本文透明同行评审过程的记录。
更新日期:2024-02-09
down
wechat
bug