论文标题

部分可观测时空混沌系统的无模型预测

UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining

论文作者

Li, Jiacheng, Shang, Jingbo, McAuley, Julian

论文摘要

高质量的短语表示对于查找文档中的主题和相关术语至关重要(又称主题挖掘)。现有的短语表示学习方法要么简单地以无上下文的方式将界面表示形式相结合,要么依靠广泛的注释来学习上下文感知的知识。在本文中,我们提出了UCTOPIC,这是一种新颖的无监督对比学习框架,用于上下文感知的短语表示和主题挖掘。大规模预估计,以区分两个短语的上下文是否具有相同的语义。预处理的关键是我们面向词组的假设的阳性对结构。但是,我们发现传统的内部负面负面因素会导致在具有较小主题编号的数据集上进行填充时性能衰减。因此,我们提出了集群辅助的对比学习(CCL),该学习在很大程度上通过从集群中选择负面因素来大大降低嘈杂的负面因素,并进一步改善相应的主题短语表示。在四个实体群集任务上,UCTOPIC的表现平均比最新的短语表示模型平均比38.2%NMI。对主题挖掘的全面评估表明,UCTOPIC可以提取相干和多样化的局部短语。

High-quality phrase representations are essential to finding topics and related terms in documents (a.k.a. topic mining). Existing phrase representation learning methods either simply combine unigram representations in a context-free manner or rely on extensive annotations to learn context-aware knowledge. In this paper, we propose UCTopic, a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining. UCTopic is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics. The key to pretraining is positive pair construction from our phrase-oriented assumptions. However, we find traditional in-batch negatives cause performance decay when finetuning on a dataset with small topic numbers. Hence, we propose cluster-assisted contrastive learning(CCL) which largely reduces noisy negatives by selecting negatives from clusters and further improves phrase representations for topics accordingly. UCTopic outperforms the state-of-the-art phrase representation model by 38.2% NMI in average on four entity cluster-ing tasks. Comprehensive evaluation on topic mining shows that UCTopic can extract coherent and diverse topical phrases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源