论文标题
原则性的多相关评估措施
Principled Multi-Aspect Evaluation Measures of Rankings
论文作者
论文摘要
传统上,信息检索评估一直集中在定义评估与查询有关的文档排名列表的相关性的原则方法上。几种方法将这种类型的评估扩展到相关性之外,从而可以使用单个措施(多灵感评估)评估文档排名的不同方面(例如相关性,有用性或信誉)。但是,这些方法要么是(i)针对特定方面量身定制的,因此不会扩展到其他类型或方面的数量,或者(ii)具有理论异常,例如将最大分数分配给所有文档都标有最低等级的排名,相对于所有方面(例如,不相关,不可信等)。 我们提出了一种理论上有原则的多相关评估方法,可用于任何数字和任何类型的方面。彻底使用多达5个方面的彻底经验评估,正式提交10个TREC轨道的425次运行表明,我们的方法比最先进的方法更具歧视性,并克服了最先进的方法的理论限制。
Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation). However, these methods either are (i) tailor-made for specific aspects and do not extend to other types or numbers of aspects, or (ii) have theoretical anomalies, e.g. assign maximum score to a ranking where all documents are labelled with the lowest grade with respect to all aspects (e.g., not relevant, not credible, etc.). We present a theoretically principled multi-aspect evaluation method that can be used for any number, and any type, of aspects. A thorough empirical evaluation using up to 5 aspects and a total of 425 runs officially submitted to 10 TREC tracks shows that our method is more discriminative than the state-of-the-art and overcomes theoretical limitations of the state-of-the-art.