论文标题

为什么不简单地翻译呢?语义相似性的第一个瑞典评估基准

Why Not Simply Translate? A First Swedish Evaluation Benchmark for Semantic Similarity

论文作者

Isbister, Tim, Sahlgren, Magnus

论文摘要

本文介绍了文本语义相似性的第一个瑞典评估基准。通过简单地通过Google Machine Translation API运行英语STS-B数据集来编译基准测试。本文讨论了使用这种简单方法来编译瑞典评估基准的潜在问题,包括翻译错误,词汇变化和生产性复合。尽管由此产生的数据集有一些明显的问题,但我们使用基准测试比较了当前现有的瑞典文本表示形式的大多数,这表明本地模型的表现优于多语言,而简单的单词袋的性能非常好。

This paper presents the first Swedish evaluation benchmark for textual semantic similarity. The benchmark is compiled by simply running the English STS-B dataset through the Google machine translation API. This paper discusses potential problems with using such a simple approach to compile a Swedish evaluation benchmark, including translation errors, vocabulary variation, and productive compounding. Despite some obvious problems with the resulting dataset, we use the benchmark to compare the majority of the currently existing Swedish text representations, demonstrating that native models outperform multilingual ones, and that simple bag of words performs remarkably well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源