固态合成中前体的相似性是科学文献中的文本开采的

论文标题

固态合成中前体的相似性是科学文献中的文本开采的

Similarity of Precursors in Solid-state Synthesis as Text-Mined from Scientific Literature

论文作者

He, Tanjin, Sun, Wenhao, Huo, Haoyan, Kononova, Olga, Rong, Ziqin, Tshitoyan, Vahe, Botari, Tiago, Ceder, Gerbrand

论文摘要

收集和分析固态化学文献中可用的大量信息可能会加速我们对材料合成的理解。但是，一个主要问题是难以识别合成段的哪些材料是前体或目标材料。在这项研究中，我们开发了一个两步化学命名实体识别（CNER）模型，以根据围绕材料实体的上下文的信息来识别前体和目标。使用提取的数据，我们进行了荟萃分析，以研究固态合成背景下前体之间的相似性和差异。为了量化前体相似性，我们构建了一个替代模型，以计算一个在保留目标的同时，用另一个前体替换一个前体的生存能力。从前体的分层聚类中，我们证明了可以从文本数据中提取前体的“化学相似性”。量化前体的相似性有助于为预测合成模型中的候选反应物提供基础。

Collecting and analyzing the vast amount of information available in the solid-state chemistry literature may accelerate our understanding of materials synthesis. However, one major problem is the difficulty of identifying which materials from a synthesis paragraph are precursors or are target materials. In this study, we developed a two-step Chemical Named Entity Recognition (CNER) model to identify precursors and targets, based on information from the context around material entities. Using the extracted data, we conducted a meta-analysis to study the similarities and differences between precursors in the context of solid-state synthesis. To quantify precursor similarity, we built a substitution model to calculate the viability of substituting one precursor with another while retaining the target. From a hierarchical clustering of the precursors, we demonstrate that "chemical similarity" of precursors can be extracted from text data. Quantifying the similarity of precursors helps provide a foundation for suggesting candidate reactants in a predictive synthesis model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题