论文标题
耦合语义和统计技术,用于动态丰富网络本体
Coupling semantic and statistical techniques for dynamically enriching web ontologies
论文作者
论文摘要
随着语义Web技术的发展,使用本体来存储和检索涵盖多个领域的信息已有所增加。但是,很少有本体能够应付不断增长的专用域中经常更新的语义信息或特定用户需求的需求。结果,一个关键问题与概念之间的关系信息的不可用,也引起了背景知识的缺失。解决此问题的一种解决方案依赖于领域专家的本体学的手动丰富,但是,这是一个耗时且昂贵的过程,因此需要动态本体论的富集。在本文中,我们提出了一个自动耦合统计/语义框架,用于动态丰富来自万维网的大规模通用本体。因此,使用网络上文本中编码的大量信息作为语料库,因此可以通过语义相关性措施和模式采集技术的结合来发现缺少的背景知识,然后被利用。我们方法的好处是:(i)提出具有背景知识缺失的大规模通用本体论的动态丰富,因此,(ii)解决了域专家的昂贵本体论手动丰富问题的问题。基于精确的评估设置的实验结果证明了所提出的技术的有效性。
With the development of the Semantic Web technology, the use of ontologies to store and retrieve information covering several domains has increased. However, very few ontologies are able to cope with the ever-growing need of frequently updated semantic information or specific user requirements in specialized domains. As a result, a critical issue is related to the unavailability of relational information between concepts, also coined missing background knowledge. One solution to address this issue relies on the manual enrichment of ontologies by domain experts which is however a time consuming and costly process, hence the need for dynamic ontology enrichment. In this paper we present an automatic coupled statistical/semantic framework for dynamically enriching large-scale generic ontologies from the World Wide Web. Using the massive amount of information encoded in texts on the Web as a corpus, missing background knowledge can therefore be discovered through a combination of semantic relatedness measures and pattern acquisition techniques and subsequently exploited. The benefits of our approach are: (i) proposing the dynamic enrichment of large-scale generic ontologies with missing background knowledge, and thus, enabling the reuse of such knowledge, (ii) dealing with the issue of costly ontological manual enrichment by domain experts. Experimental results in a precision-based evaluation setting demonstrate the effectiveness of the proposed techniques.