了解和预测信息检索中测试收集的特征

论文标题

了解和预测信息检索中测试收集的特征

Understanding and Predicting Characteristics of Test Collections in Information Retrieval

论文作者

Rahman, Md Mustafizur, Kutlu, Mucahid, Lease, Matthew

论文摘要

研究社区在信息检索中的评估，例如NIST的文本检索会议（TREC），通过汇总许多团队提交的文档排名来构建可重复使用的测试收集。自然，因此，由此产生的测试收集的质量在很大程度上取决于参与团队的数量及其提交的跑步质量。在这项工作中，我们调查：i）参与者的数量以及其他因素如何影响测试收集的质量； ii）在收集人类评估者的相关性判断之前，是否可以推断出测试的质量。在六个TREC集合上进行的实验说明了团队的数量如何与其他各种因素相互作用，以影响测试收集的质量。我们还表明，当在评估活动中连续使用同一文档收集时，可以高准确地预测测试收集的可重复使用性，这是TREC中常见的。

Research community evaluations in information retrieval, such as NIST's Text REtrieval Conference (TREC), build reusable test collections by pooling document rankings submitted by many teams. Naturally, the quality of the resulting test collection thus greatly depends on the number of participating teams and the quality of their submitted runs. In this work, we investigate: i) how the number of participants, coupled with other factors, affects the quality of a test collection; and ii) whether the quality of a test collection can be inferred prior to collecting relevance judgments from human assessors. Experiments conducted on six TREC collections illustrate how the number of teams interacts with various other factors to influence the resulting quality of test collections. We also show that the reusability of a test collection can be predicted with high accuracy when the same document collection is used for successive years in an evaluation campaign, as is common in TREC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题