论文标题

将异质域信息整合到关系提取中:关于药物 - 药物相互作用提取的案例研究

Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction

论文作者

Asada, Masaki

论文摘要

深度神经网络的发展改善了各种领域的表示学习,包括文本,图形结构和关系三重表示。这一发展为超出传统的面向文本的关系提取的新关系提取的大门打开了大门。但是,对同时考虑多个异构域信息的有效性的研究仍在探索中,如果模型可以利用整合异质信息,则预计它将对世界上许多问题表现出重大贡献。该论文对文献研究的药物相互作用(DDI)作用,作为案例研究,并通过异质域信息实现了提取的关系提取。首先,准备了深层神经关系提取模型,并分析了其注意力机制。接下来,提出了一种将药物分子结构信息和药物描述信息结合到输入句子信息中的方法,并显示了使用药物分子结构和药物描述的有效性。然后,为了进一步利用异质信息,从多个现有数据库中收集了与药物相关的项目,例如蛋白质条目,医学术语和途径,并以知识图(kg)的形式收集了一个新数据集。进行了构建数据集的链接预测任务,以获得包含异质域信息的药物的嵌入表示。最后,提出了一种整合输入句子信息和异质kg信息的方法。对所提出的模型进行了对广泛使用的数据集的训练和评估,结果,使用异质域信息可​​以显着提高从文献中提取关系的性能。

The development of deep neural networks has improved representation learning in various domains, including textual, graph structural, and relational triple representations. This development opened the door to new relation extraction beyond the traditional text-oriented relation extraction. However, research on the effectiveness of considering multiple heterogeneous domain information simultaneously is still under exploration, and if a model can take an advantage of integrating heterogeneous information, it is expected to exhibit a significant contribution to many problems in the world. This thesis works on Drug-Drug Interactions (DDIs) from the literature as a case study and realizes relation extraction utilizing heterogeneous domain information. First, a deep neural relation extraction model is prepared and its attention mechanism is analyzed. Next, a method to combine the drug molecular structure information and drug description information to the input sentence information is proposed, and the effectiveness of utilizing drug molecular structures and drug descriptions for the relation extraction task is shown. Then, in order to further exploit the heterogeneous information, drug-related items, such as protein entries, medical terms and pathways are collected from multiple existing databases and a new data set in the form of a knowledge graph (KG) is constructed. A link prediction task on the constructed data set is conducted to obtain embedding representations of drugs that contain the heterogeneous domain information. Finally, a method that integrates the input sentence information and the heterogeneous KG information is proposed. The proposed model is trained and evaluated on a widely used data set, and as a result, it is shown that utilizing heterogeneous domain information significantly improves the performance of relation extraction from the literature.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源