论文标题
pifold:朝着有效有效的蛋白质逆折叠
PiFold: Toward effective and efficient protein inverse folding
论文作者
论文摘要
我们如何设计蛋白质序列有效,有效地折叠到所需的结构中?近年来,用于基于结构的蛋白质设计的AI方法吸引了越来越多的关注。但是,由于缺乏表达性特征和自回归序列解码器,很少有方法可以同时提高准确性和效率。为了解决这些问题,我们提出了Pifold,其中包含一种新型的残基特征和Pignn层,以一种单发的方式生成蛋白质序列,并改善恢复。实验表明,Pifold可以在CATH 4.2上实现51.66 \%的回收率,而推理速度的速度比自回旋竞争者快70倍。此外,Pifold分别在TS50和TS500上分别达到58.72 \%和60.42 \%的恢复得分。我们进行全面的消融研究,以揭示不同类型的蛋白质特征和模型设计的作用,从而激发了进一步的简化和改进。 Pytorch代码可在\ href {https://github.com/a4bio/pifold} {github}上获得。
How can we design protein sequences folding into the desired structures effectively and efficiently? AI methods for structure-based protein design have attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that PiFold could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, PiFold achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement. The PyTorch code is available at \href{https://github.com/A4Bio/PiFold}{GitHub}.