论文标题
使用后续行动的开放域对话框评估
Open-Domain Dialog Evaluation using Follow-Ups Likelihood
论文作者
论文摘要
开放域对话框的自动评估仍然是一个未解决的问题。此外,现有方法与人类注释没有密切相关。本文使用后续行动提出了一种新的自动化评估方法:我们衡量语言模型将继续使用固定的后续行动继续对话的可能性(例如,在这里不真正相关,您想说什么)。与现有的十二种方法进行比较时,我们的新评估与人类评估的最高相关性。
Automatic evaluation of open-domain dialogs remains an unsolved problem. Moreover, existing methods do not correlate strongly with human annotations. This paper presents a new automated evaluation method using follow-ups: we measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g., not really relevant here, what are you trying to say). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.