论文标题
六肽中的简洁淀粉样蛋白和非淀粉样蛋白模式
Succinct Amyloid and Non-Amyloid Patterns in Hexapeptides
论文作者
论文摘要
六肽被广泛用作模型系统,用于研究包括蛋白质在内的多肽的淀粉样蛋白形成特性。最近,大型实验数据库已通过淀粉样蛋白生成标签公开获得。将这些数据集用于培训和测试目的,可以建立人工智能(AI)基于基于人工智能的分类器来预测肽的淀粉样蛋白。在我们以前的工作(BioMolecules,11(4)500,(2021))中,我们描述了支持向量机(SVM)基于的布达佩斯淀粉样蛋白预测变量(\ url {https://pitgroup.org/bap})。在这里,我们将布达佩斯淀粉样蛋白预测变量用于发现众多淀粉样蛋白生成和非淀粉样蛋白六肽模式,精度在80 \%和84 \%之间,是令人惊讶和简洁的新规则,以进一步理解肽的淀粉样蛋白。例如,我们已经表明,对于任何独立突变的残基(以``x''的标记位置)),预计CXFLWX,FXFLFX或XXIVIV的模式被预测为淀粉样蛋白的生成性,而PXDXXX,XXKXEX和XXPQXX非氨基异基因的pxdxxxx,xxxxx。我们注意到,每个淀粉样蛋白生成模式都带有两个X(例如CXFLWX)简洁地描述了$ 20^2 = 400 $六肽,而非淀粉样生成模式总计包含四个点突变(例如PXDXXX),总计$ 20^4 = 160,000^4 = 160,000 $ HEXAPEPTIDES。据我们所知,在本工作之前,没有描述人工智能工具或简洁的淀粉样蛋白模式的类似应用。
Hexapeptides are widely applied as a model system for studying amyloid-forming properties of polypeptides, including proteins. Recently, large experimental databases have become publicly available with amyloidogenic labels. Using these datasets for training and testing purposes, one may build artificial intelligence (AI)-based classifiers for predicting the amyloid state of peptides. In our previous work (Biomolecules, 11(4) 500, (2021)) we described the Support Vector Machine (SVM)-based Budapest Amyloid Predictor (\url{https://pitgroup.org/bap}). Here we apply the Budapest Amyloid Predictor for discovering numerous amyloidogenic and non-amyloidogenic hexapeptide patterns with accuracy between 80\% and 84\%, as surprising and succinct novel rules for further understanding the amyloid state of peptides. For example, we have shown that for any independently mutated residue (position marked by ``x''), the patterns CxFLWx, FxFLFx, or xxIVIV are predicted to be amyloidogenic, while those of PxDxxx, xxKxEx, and xxPQxx non-amyloidogenic at all. We note that each amyloidogenic pattern with two x's (e.g.,CxFLWx) describes succinctly $20^2=400$ hexapeptides, while the non-amyloidogenic patterns comprising four point mutations (e.g.,PxDxxx) gives $20^4=160,000$ hexapeptides in total. To our knowledge, no similar applications of artificial intelligence tools or succinct amyloid patterns were described before the present work.