论文标题
是什么使表现最好的精密药物搜索引擎?以系统的方式跟踪主要系统功能
What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way
论文作者
论文摘要
从2017年到2019年,文本检索会议(TREC)使用医学出版物(PubMed)和临床试验的文件进行了精确医学的挑战任务。尽管在这些评估活动中进行了大量的性能测量,但科学界仍然不确定各个系统功能及其权重对整体系统性能的影响。为了克服这一解释性差距,我们首先使用基于基于模型的算法配置(SMAC)程序确定最佳特征配置,并将其输出应用于基于BM25的搜索引擎。然后,我们进行了一项消融研究,以系统地评估相关系统特征的个人贡献:BM25参数,查询类型和加权架构,查询扩展,停止单词过滤和关键字促进。为了进行评估,我们使用了三个TREC-PM分期付款的黄金标准数据来使用常用的INFNDCG度量来评估不同特征的有效性。
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.