论文标题
蛋白质构象状态:第一原理贝叶斯方法
Protein Conformational States: A First Principles Bayesian Method
论文作者
论文摘要
通过模拟结构集合来自动识别蛋白质构象状态是一个困难的问题,因为它需要教计算机识别形状。我们将机器学习社区的天真贝叶斯分类器调整为原子与原子的成对触点。结果是一种无监督的学习算法,对潜在分类方案进行了“分布”的样本。我们将分类器应用于一系列测试结构和一种实际蛋白质,表明它在大多数情况下鉴定了构象转变> 95%的精度。我们适应的非平凡特征是与信息熵的新联系,它使我们能够在不破坏分类的情况下改变结构细节的水平。通过比较结果来证实这一点,因为原子的数量和时间样本在1.5个数量级以上。此外,该方法从贝叶斯分析对原子间接触的集合的衍生使得易于理解并扩展到更复杂的情况。
Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naive Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a `distribution' over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with > 95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method's derivation from Bayesian analysis on the set of inter-atomic contacts makes it easy to understand and extend to more complex cases.