论文标题

SynkB:语义搜索合成过程

SynKB: Semantic Search for Synthetic Procedures

论文作者

Bai, Fan, Ritter, Alan, Madrid, Peter, Freitag, Dayne, Niekrasz, John

论文摘要

在本文中,我们介绍了SynkB,这是一种自动提取化学合成方案的知识库的开源。类似于专有化学数据库,例如Reaxsys,SynkB允许化学家检索有关合成程序的结构化知识。通过利用自然语言处理程序文本的最新进展,SynkB支持有关反应条件的更灵活的查询,因此有潜力帮助化学家在设计新的合成路线时搜索相关反应中使用的条件。使用自定义的变压器模型从美国和欧盟专利中描述的600万个合成程序中自动提取信息,我们表明,在许多查询中,SynkB的召回率高于Reaxsys,同时保持高精度。我们计划使SynkB作为开源工具可用;相反,专有化学数据库需要昂贵的订阅。

In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源