同型揭示了事实：speech2vec的现实检查

论文标题

同型揭示了事实：speech2vec的现实检查

Homophone Reveals the Truth: A Reality Check for Speech2Vec

论文作者

Chen, Guangyu

论文摘要

生成具有语义信息的口语嵌入是一个引人入胜的话题。与基于文本的嵌入相比，它们涵盖了语音和语义特征，它们可以提供更丰富的信息，并且有可能有助于改善ASR和语音翻译系统。在本文中，我们审查并研究了该领域开创性工作的真实性：Specy2Vec。首先，提出了一种基于同型的检查方法来检查Speech2Vec作者发表的语音嵌入。没有迹象表明这些嵌入是由Speech2VEC模型生成的。此外，通过对词汇组成的进一步分析，我们怀疑基于文本的模型会构成这些嵌入。最后，我们重现了Secem2VEC模型，指的是原始论文中的官方代码和最佳设置。实验表明，该模型未能学习有效的语义嵌入。在单词相似性基准中，男性的相关得分为0.08，WS-353-SIM测试的相关得分为0.15，比原始论文中所述的相关得分为0.5。我们的数据和代码可用。

Generating spoken word embeddings that possess semantic information is a fascinating topic. Compared with text-based embeddings, they cover both phonetic and semantic characteristics, which can provide richer information and are potentially helpful for improving ASR and speech translation systems. In this paper, we review and examine the authenticity of a seminal work in this field: Speech2Vec. First, a homophone-based inspection method is proposed to check the speech embeddings released by the author of Speech2Vec. There is no indication that these embeddings are generated by the Speech2Vec model. Moreover, through further analysis of the vocabulary composition, we suspect that a text-based model fabricates these embeddings. Finally, we reproduce the Speech2Vec model, referring to the official code and optimal settings in the original paper. Experiments showed that this model failed to learn effective semantic embeddings. In word similarity benchmarks, it gets a correlation score of 0.08 in MEN and 0.15 in WS-353-SIM tests, which is over 0.5 lower than those described in the original paper. Our data and code are available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题