论文标题

通过潜在变量模型学习生物医学关系的信息表​​示

Learning Informative Representations of Biomedical Relations with Latent Variable Models

论文作者

Shah, Harshil, Fauqueur, Julien

论文摘要

从大型科学文档中提取生物医学关系是一项具有挑战性的自然语言处理任务。现有方法通常着重于在单个句子(提及级)或整个语料库(配对级)中识别关系。在这两种情况下,最近的方法都通过学习代表关系的点估计来取得了很大的成果。然后将其用作关系分类器的输入。但是,在一对生物医学实体之间的文本中表达的关系通常比点估计值所捕获的更为复杂。为了解决此问题,我们提出了一个带有任意灵活分布的潜在变量模型,以表示实体对之间的关​​系。此外,我们的模型为提及级别和成对级别的关系提取提供了统一的体系结构。我们证明,我们的模型可以通过强大的基准来实现两项任务的竞争,同时具有更少的参数,并且训练的速度更快。我们公开提供代码。

Extracting biomedical relations from large corpora of scientific documents is a challenging natural language processing task. Existing approaches usually focus on identifying a relation either in a single sentence (mention-level) or across an entire corpus (pair-level). In both cases, recent methods have achieved strong results by learning a point estimate to represent the relation; this is then used as the input to a relation classifier. However, the relation expressed in text between a pair of biomedical entities is often more complex than can be captured by a point estimate. To address this issue, we propose a latent variable model with an arbitrarily flexible distribution to represent the relation between an entity pair. Additionally, our model provides a unified architecture for both mention-level and pair-level relation extraction. We demonstrate that our model achieves results competitive with strong baselines for both tasks while having fewer parameters and being significantly faster to train. We make our code publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源