论文标题
AVA:一种自动评估方法,用于问答系统
AVA: an Automatic eValuation Approach to Question Answering Systems
论文作者
论文摘要
我们介绍了一种自动评估方法AVA,该方法可以回答问题,该方法给出了一组与黄金标准答案相关的问题,可以估计系统的准确性。 AVA使用基于变压器的语言模型来编码问题,回答和参考文本。这允许有效地测量参考和自动答案之间的相似性,这是对问题语义的偏见。为了设计,训练和测试AVA,我们在公共和工业基准上建立了多次大型培训,开发和测试集。我们的创新解决方案在预测人类对单个答案的判断方面的F1得分最高为74.7%。此外,根据多个参考的可用性,AVA可用于使用RMSE评估整体系统精度,范围从0.02到0.09。
We introduce AVA, an automatic evaluation approach for Question Answering, which given a set of questions associated with Gold Standard answers, can estimate system Accuracy. AVA uses Transformer-based language models to encode question, answer, and reference text. This allows for effectively measuring the similarity between the reference and an automatic answer, biased towards the question semantics. To design, train and test AVA, we built multiple large training, development, and test sets on both public and industrial benchmarks. Our innovative solutions achieve up to 74.7% in F1 score in predicting human judgement for single answers. Additionally, AVA can be used to evaluate the overall system Accuracy with an RMSE, ranging from 0.02 to 0.09, depending on the availability of multiple references.