如何测量面向系统的IR实验的可重复性

论文标题

如何测量面向系统的IR实验的可重复性

How to Measure the Reproducibility of System-oriented IR Experiments

论文作者

Breuer, Timo, Ferro, Nicola, Fuhr, Norbert, Maistro, Maria, Sakai, Tetsuya, Schaer, Philipp, Soboroff, Ian

论文摘要

实验结果的可复制性和可重复性是所有科学和IR领域的主要问题。除了将现场朝着更可再现的实验实践和方案迈进的问题外，我们还面临着一个严重的方法论问题：我们没有任何方法可以评估何时再现复制。此外，我们缺乏任何面向可重复性的数据集，这将使我们能够开发此类方法。为了解决这些问题，我们比较了几项措施，以客观地量化我们在多大程度上复制或再现了一个面向系统的IR实验。从排名列表的细粒度比较到获得的效果和显着差异的更一般比较，这些度量在不同水平的粒度上运行。此外，我们还开发了一个面向可重复性的数据集，该数据集使我们能够验证我们的措施，并且还可以用于制定未来的措施。

Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题