论文标题
有效的知识图验证通过跨刻画表示学习
Efficient Knowledge Graph Validation via Cross-Graph Representation Learning
论文作者
论文摘要
信息提取的最新进展通过大规模文本语料库采矿来促使自动构建巨大的知识图(kgs)。但是,不可避免地将嘈杂的事实引入了可能是由自动提取引起的kgs。为了验证事实(即三胞胎)在kg内的正确性,一种可能的方法是通过捕获事实的语义含义来将三胞胎映射到矢量表示中。尽管已经为知识图开发了许多表示学习方法,但这些方法对于验证无效。他们通常认为事实是正确的,因此可能过于鲁尼的事实,并且无法检测到此类事实。为了有效的kg验证,我们建议利用外部人类策划的kg作为辅助信息来源,以帮助检测目标kg中的错误。外部KG建立在人类策划的知识存储库的基础上,并且往往具有很高的精度。另一方面,尽管从文本中提取信息的目标kg精确度较低,但它可以涵盖任何不在任何人类策划的存储库中的新事实或特定于领域的事实。为了应对这项具有挑战性的任务,我们提出了一个跨编码表示学习框架,即横瓦尔,该框架可以利用外部KG有效地验证目标kg中的事实。这是通过基于其语义含义嵌入三胞胎,根据其正确性程度估算每个三胞胎的置信度得分来实现的。我们评估了跨不同域的数据集上的提议框架。实验结果表明,与大规模公斤的最新方法相比,所提出的框架可以达到最佳性能。
Recent advances in information extraction have motivated the automatic construction of huge Knowledge Graphs (KGs) by mining from large-scale text corpus. However, noisy facts are unavoidably introduced into KGs that could be caused by automatic extraction. To validate the correctness of facts (i.e., triplets) inside a KG, one possible approach is to map the triplets into vector representations by capturing the semantic meanings of facts. Although many representation learning approaches have been developed for knowledge graphs, these methods are not effective for validation. They usually assume that facts are correct, and thus may overfit noisy facts and fail to detect such facts. Towards effective KG validation, we propose to leverage an external human-curated KG as auxiliary information source to help detect the errors in a target KG. The external KG is built upon human-curated knowledge repositories and tends to have high precision. On the other hand, although the target KG built by information extraction from texts has low precision, it can cover new or domain-specific facts that are not in any human-curated repositories. To tackle this challenging task, we propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently. This is achieved by embedding triplets based on their semantic meanings, drawing cross-KG negative samples and estimating a confidence score for each triplet based on its degree of correctness. We evaluate the proposed framework on datasets across different domains. Experimental results show that the proposed framework achieves the best performance compared with the state-of-the-art methods on large-scale KGs.