关于释义的评估指标

论文标题

关于释义的评估指标

On the Evaluation Metrics for Paraphrase Generation

论文作者

Shen, Lingfeng, Liu, Lemao, Jiang, Haiyun, Shi, Shuming

论文摘要

在本文中，我们重新访问了释义评估的自动指标，并获得了两种不服从传统智慧的发现：（1）无参考指标的性能比基于参考的对应物更好。（2）最常用的指标不能与人类注释很好地保持一致。通过其他实验和深入分析来探讨上述发现背后的根本原因。基于实验和分析，我们提出了Parascore，这是一种新的评估度量标准。它具有基于参考和无参考指标的优点，并明确模型词汇差异。实验结果表明，帕斯西尔显着胜过现有指标。

In this paper we revisit automatic metrics for paraphrase evaluation and obtain two findings that disobey conventional wisdom: (1) Reference-free metrics achieve better performance than their reference-based counterparts. (2) Most commonly used metrics do not align well with human annotation. Underlying reasons behind the above findings are explored through additional experiments and in-depth analyses. Based on the experiments and analyses, we propose ParaScore, a new evaluation metric for paraphrase generation. It possesses the merits of reference-based and reference-free metrics and explicitly models lexical divergence. Experimental results demonstrate that ParaScore significantly outperforms existing metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题