论文标题

结合单词嵌入和卷积神经网络以检测重复的问题

Combining word embeddings and convolutional neural networks to detect duplicated questions

论文作者

Dimitrov, Yoan

论文摘要

由于自然语言的歧义,句子之间检测语义相似性仍然是一个挑战。在这项工作中,我们提出了一种简单的方法来通过结合单词嵌入和卷积神经网络(CNN)的优势来识别语义上相似的问题。此外,我们演示了如何使用余弦相似性度量来有效比较特征向量。我们的网络在Quora数据集上进行了培训,该数据集包含超过400K的问题对。我们尝试使用不同的嵌入方法,例如Word2Vec,fastText和doc2vec,并研究这些方法对模型性能的影响。我们的模型在Quora数据集上取得了竞争成果,并补充了公认的证据,即CNN可以用于释义检测任务。

Detecting semantic similarities between sentences is still a challenge today due to the ambiguity of natural languages. In this work, we propose a simple approach to identifying semantically similar questions by combining the strengths of word embeddings and Convolutional Neural Networks (CNNs). In addition, we demonstrate how the cosine similarity metric can be used to effectively compare feature vectors. Our network is trained on the Quora dataset, which contains over 400k question pairs. We experiment with different embedding approaches such as Word2Vec, Fasttext, and Doc2Vec and investigate the effects these approaches have on model performance. Our model achieves competitive results on the Quora dataset and complements the well-established evidence that CNNs can be utilized for paraphrase detection tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源