论文标题
分子生成机学习的生物学启发的多模式评估
A biologically-inspired multi-modal evaluation of molecular generative machine learning
论文作者
论文摘要
尽管生成模型最近在许多科学领域变得无处不在,但对其评估的关注较少。对于分子生成模型,最先进的是孤立或与其输入有关的输出。但是,它们的生物学和功能特性(例如配体 - 靶标相互作用)尚未得到解决。在这项研究中,提出了一种新型的生物学启发的基准,用于评估分子生成模型。具体而言,设计了三个不同的参考数据集,并引入了与药物发现过程直接相关的一组指标。特别是我们提出了一个娱乐指标,将药物目标亲和力预测和分子对接应用作为评估生成产量的互补技术。尽管所有三个指标在测试的生成模型中均表现出一致的结果,但对药物目标亲和力结合和分子对接得分进行了更详细的比较,表明单峰预测器可能会导致有关在分子水平和多模式方法上靶向结合的错误结论。该框架的关键优势在于,它通过明确关注配体 - 靶标相互作用,将先前的物理化学域知识纳入基准测试过程,从而创造了一种高效的工具,不仅是为了评估分子生成型的尤其是为了评估一般的药物发现过程。
While generative models have recently become ubiquitous in many scientific areas, less attention has been paid to their evaluation. For molecular generative models, the state-of-the-art examines their output in isolation or in relation to its input. However, their biological and functional properties, such as ligand-target interaction is not being addressed. In this study, a novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed. Specifically, three diverse reference datasets are designed and a set of metrics are introduced which are directly relevant to the drug discovery process. In particular we propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs. While all three metrics show consistent results across the tested generative models, a more detailed comparison of drug-target affinity binding and molecular docking scores revealed that unimodal predictiors can lead to erroneous conclusions about target binding on a molecular level and a multi-modal approach is thus preferrable. The key advantage of this framework is that it incorporates prior physico-chemical domain knowledge into the benchmarking process by focusing explicitly on ligand-target interactions and thus creating a highly efficient tool not only for evaluating molecular generative outputs in particular, but also for enriching the drug discovery process in general.