从实验和计算机生成的蛋白质结构中发现药物发现的机器学习评分功能：迈向每个目标评分功能

论文标题

从实验和计算机生成的蛋白质结构中发现药物发现的机器学习评分功能：迈向每个目标评分功能

Machine Learning Scoring Functions for Drug Discoveries from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions

论文作者

Pellicani, F., Ben, D. Dal, Perali, A., Pilati, S.

论文摘要

近年来，已经提出机器学习是一种有希望的策略，以建立准确的评分功能，以最终确定到数值授权的药物发现。但是，最新的研究表明，由于用于培训和测试的实验数据库中存在的相关性，已经报道了过度的结果。在这里，我们研究了人工神经网络在结合亲和力预测中的性能，并比较了使用实验蛋白质配体结构以及使用商业软件创建的大型计算机生成的结构进行比较。有趣的是，两个数据库都获得了类似的性能。从随机水平测试转变为对训练数据中未包括的目标蛋白进行的垂直测试时，我们发现了明显的性能抑制。在相对容易创建的计算机生成的数据库上训练网络的可能性使我们探索了每个目标评分功能，对复合物的训练和测试的临时训练和测试，包括仅一个目标蛋白。取决于要解决的蛋白质的类型，获得了令人鼓舞的结果。

In recent years, machine learning has been proposed as a promising strategy to build accurate scoring functions for computational docking finalized to numerically empowered drug discovery. However, the latest studies have suggested that over-optimistic results had been reported due to the correlations present in the experimental databases used for training and testing. Here, we investigate the performance of an artificial neural network in binding affinity predictions, comparing results obtained using both experimental protein-ligand structures as well as larger sets of computer-generated structures created using commercial software. Interestingly, similar performances are obtained on both databases. We find a noticeable performance suppression when moving from random horizontal tests to vertical tests performed on target proteins not included in the training data. The possibility to train the network on relatively easily created computer-generated databases leads us to explore per-target scoring functions, trained and tested ad-hoc on complexes including only one target protein. Encouraging results are obtained, depending on the type of protein being addressed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题