论文标题

有效比较句子嵌入

Efficient comparison of sentence embeddings

论文作者

Zoupanos, Spyros, Kolovos, Stratis, Kanavos, Athanasios, Papadimitriou, Orestis, Maragoudakis, Manolis

论文摘要

自然语言处理的领域(NLP)在过去几年中发生了很大发展,它从最近的词和句子嵌入中受益匪浅。这样的嵌入使复杂的NLP任务(例如语义相似性或问答(Q&A))的转换更简单地进行矢量比较。但是,这种问题的转变提出了新的挑战,例如嵌入的有效比较及其操纵。在这项工作中,我们将讨论各种单词和句子嵌入算法,我们将选择一个嵌入算法的句子,作为我们选择的算法,我们将评估两种矢量比较方法的性能,即Faiss和Elasticsearch,在句子嵌入的特定问题中。根据结果​​,FAISS在仅具有一个节点的集中环境中使用时优于Elasticsearch,尤其是在包括大数据集时。

The domain of natural language processing (NLP), which has greatly evolved over the last years, has highly benefited from the recent developments in word and sentence embeddings. Such embeddings enable the transformation of complex NLP tasks, like semantic similarity or Question and Answering (Q&A), into much simpler to perform vector comparisons. However, such a problem transformation raises new challenges like the efficient comparison of embeddings and their manipulation. In this work, we will discuss about various word and sentence embeddings algorithms, we will select a sentence embedding algorithm, BERT, as our algorithm of choice and we will evaluate the performance of two vector comparison approaches, FAISS and Elasticsearch, in the specific problem of sentence embeddings. According to the results, FAISS outperforms Elasticsearch when used in a centralized environment with only one node, especially when big datasets are included.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源