论文标题
PYBART:IE的基于证据的句法变换
pyBART: Evidence-based Syntactic Transformations for IE
论文作者
论文摘要
句法依赖性可以高精度预测,对于机器学习和基于模式的信息提取任务都有用。但是,它们的效用可以改善。这些句法依赖性旨在准确反映句法关系,并且不会显式语义关系。因此,这些表示形式缺乏内容词之间的许多明确连接,这对于下游应用程序很有用。诸如英语增强UD之类的建议通过使用其他明确的弧线扩展了通用的依赖树,从而改善了情况。但是,Python用户无法使用它们,并且覆盖范围也有限。我们介绍了一组宽覆盖,数据驱动和语言声音的转换集,这使事件结构和许多词汇关系明确。我们提出了Pybart,这是一个易于使用的开源Python库,用于将英语UD树转换为增强的UD图或我们的表示形式。该库可以用作独立包装,也可以集成在Spacy NLP管道中。当在基于模式的关系提取方案中评估时,我们的表示会导致比增强UD更高的提取得分,同时需要更少的模式。
Syntactic dependencies can be predicted with high accuracy, and are useful for both machine-learned and pattern-based information extraction tasks. However, their utility can be improved. These syntactic dependencies are designed to accurately reflect syntactic relations, and they do not make semantic relations explicit. Therefore, these representations lack many explicit connections between content words, that would be useful for downstream applications. Proposals like English Enhanced UD improve the situation by extending universal dependency trees with additional explicit arcs. However, they are not available to Python users, and are also limited in coverage. We introduce a broad-coverage, data-driven and linguistically sound set of transformations, that makes event-structure and many lexical relations explicit. We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation. The library can work as a standalone package or be integrated within a spaCy NLP pipeline. When evaluated in a pattern-based relation extraction scenario, our representation results in higher extraction scores than Enhanced UD, while requiring fewer patterns.