论文标题
八位字:在线目录分类学富集与自学
Octet: Online Catalog Taxonomy Enrichment with Self-Supervision
论文作者
论文摘要
分类法已经在各个领域中发现了广泛的应用程序,尤其是在线用于项目分类,浏览和搜索。尽管普遍使用在线目录分类法,但实际上,大多数人都由人类维持,这是劳动密集型且难以扩展的。尽管文献中对从头开始的分类法进行了大量研究,但如何有效地丰富现有的不完整分类法仍然是一个开放但重要的研究问题。分类学丰富不仅需要鲁棒性来处理新兴条款,而且还需要现有的分类结构和新术语依恋之间的一致性。在本文中,我们提出了一个自我监督的端到端框架,八位章,用于在线目录分类学丰富。 Octet利用在线目录分类法所特有的异质信息,例如用户查询,项目及其与分类器节点的关系,而除了现有的分类学之外,不需要其他监督。我们建议远距离训练一个序列标记模型,以进行术语提取,并采用图形神经网络(GNNS)来捕获分类学结构以及用于术语附件的查询数据学相互作用。在不同的在线域中进行的广泛实验表明,通过自动评估和人类评估,八位位比最先进的方法的优越性。值得注意的是,Octet在开放世界评估中将在线目录分类学丰富了2倍。
Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-the-art methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.