论文标题

关于释义的评估指标

On the Evaluation Metrics for Paraphrase Generation

论文作者

Shen, Lingfeng, Liu, Lemao, Jiang, Haiyun, Shi, Shuming

论文摘要

在本文中,我们重新访问了释义评估的自动指标,并获得了两种不服从传统智慧的发现:(1)无参考指标的性能比基于参考的对应物更好。 (2)最常用的指标不能与人类注释很好地保持一致。通过其他实验和深入分析来探讨上述发现背后的根本原因。基于实验和分析,我们提出了Parascore,这是一种新的评估度量标准。它具有基于参考和无参考指标的优点,并明确模型词汇差异。实验结果表明,帕斯西尔显着胜过现有指标。

In this paper we revisit automatic metrics for paraphrase evaluation and obtain two findings that disobey conventional wisdom: (1) Reference-free metrics achieve better performance than their reference-based counterparts. (2) Most commonly used metrics do not align well with human annotation. Underlying reasons behind the above findings are explored through additional experiments and in-depth analyses. Based on the experiments and analyses, we propose ParaScore, a new evaluation metric for paraphrase generation. It possesses the merits of reference-based and reference-free metrics and explicitly models lexical divergence. Experimental results demonstrate that ParaScore significantly outperforms existing metrics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源