AGATHA：基于自动绘图挖掘和基于变压器的假设生成方法

论文标题

AGATHA：基于自动绘图挖掘和基于变压器的假设生成方法

AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach

论文作者

Sybrandt, Justin, Tyagin, Ilya, Shtutman, Michael, Safro, Ilya

论文摘要

医学研究风险和昂贵。例如，药物发现要求研究人员有效地将成千上万的潜在目标带到一个小型候选人中，以进行更彻底的评估。但是，研究小组花费大量时间和金钱来执行必要的实验，以确定此候选人在看到中间结果之前设置的这一候选人。假设生成系统通过挖掘大量公开科学信息来预测合理的研究方向，以应对这一挑战。我们提出了Agatha，这是一种深入学习的假设生成系统，可以在发现过程中引入数据驱动的见解。通过学习的排名标准，该系统迅速将实体集合之间的合理术语对置于优先级，从而使我们能够推荐新的研究方向。我们通过暂时性保持大量验证我们的系统，其中我们预测使用事先发布的数据在2015年之后首次引入连接。我们还探索了生物医学子域，并在二十种最受欢迎的关系类型中展示了Agatha的预测能力。该系统在既定的基准上实现了一流的性能，并在各个子域中展示了很高的推荐分数。可重复性：所有代码，实验数据和预培训模型都可以在线获得：SYBRANDT.com/2020/AGATHA

Medical research is risky and expensive. Drug discovery, as an example, requires that researchers efficiently winnow thousands of potential targets to a small candidate set for more thorough evaluation. However, research groups spend significant time and money to perform the experiments necessary to determine this candidate set long before seeing intermediate results. Hypothesis generation systems address this challenge by mining the wealth of publicly available scientific information to predict plausible research directions. We present AGATHA, a deep-learning hypothesis generation system that can introduce data-driven insights earlier in the discovery process. Through a learned ranking criteria, this system quickly prioritizes plausible term-pairs among entity sets, allowing us to recommend new research directions. We massively validate our system with a temporal holdout wherein we predict connections first introduced after 2015 using data published beforehand. We additionally explore biomedical sub-domains, and demonstrate AGATHA's predictive capacity across the twenty most popular relationship types. This system achieves best-in-class performance on an established benchmark, and demonstrates high recommendation scores across subdomains. Reproducibility: All code, experimental data, and pre-trained models are available online: sybrandt.com/2020/agatha

下载PDF全文

下载文献需遵守相关版权规定

论文标题