评估两种基于深度学习的信息检索系统的COVID-19文献

论文标题

评估两种基于深度学习的信息检索系统的COVID-19文献

An Evaluation of Two Commercial Deep Learning-Based Information Retrieval Systems for COVID-19 Literature

论文作者

Soni, Sarvesh, Roberts, Kirk

论文摘要

COVID-19大流行导致主要是通过使用文本挖掘和搜索工具来访问最新科学信息的巨大需求。这导致了与Covid-19有关的生物医学文章（例如Cord-19 Corpus（Wang等，2020））以及搜索引擎以查询此类数据的两种情况。尽管搜索引擎中的大多数研究都是在信息检索的学术领域进行的，但大多数学术搜索引擎$ \ unicode {x2013} $尽管严格评估了$ \ unicode {x2013} $，稀疏使用，而主要的商业网络搜索机器（e.g.，google，bing，bing）占主导地位。这与COVID-19有关，因为可以预期，为大流行而部署的商业搜索引擎将获得比学术实验室中生产的商业搜索引擎的牵引力要高得多，从而导致有关这些搜索工具的经验性能的疑问。本文旨在与Google和Amazon生产的Covid-19进行经验评估两种此类商业搜索引擎，与在TREC-Covid Track的背景下评估的更多学术原型相比（Roberts等，2020）。我们执行了几个步骤，以减少可用的手动判断中的偏见，以确保将这两个系统与提交给Trec-Covid的系统进行公平比较。我们发现，BPREF指标的TREC-COVID的表现最佳系统在本研究中评估的所有指标中评估的不同系统中表现最好。这对为未来的健康危机以及对流行健康搜索引擎的信任开发生物医学检索系统具有启示。

The COVID-19 pandemic has resulted in a tremendous need for access to the latest scientific information, primarily through the use of text mining and search tools. This has led to both corpora for biomedical articles related to COVID-19 (such as the CORD-19 corpus (Wang et al., 2020)) as well as search engines to query such data. While most research in search engines is performed in the academic field of information retrieval (IR), most academic search engines$\unicode{x2013}$though rigorously evaluated$\unicode{x2013}$are sparsely utilized, while major commercial web search engines (e.g., Google, Bing) dominate. This relates to COVID-19 because it can be expected that commercial search engines deployed for the pandemic will gain much higher traction than those produced in academic labs, and thus leads to questions about the empirical performance of these search tools. This paper seeks to empirically evaluate two such commercial search engines for COVID-19, produced by Google and Amazon, in comparison to the more academic prototypes evaluated in the context of the TREC-COVID track (Roberts et al., 2020). We performed several steps to reduce bias in the available manual judgments in order to ensure a fair comparison of the two systems with those submitted to TREC-COVID. We find that the top-performing system from TREC-COVID on bpref metric performed the best among the different systems evaluated in this study on all the metrics. This has implications for developing biomedical retrieval systems for future health crises as well as trust in popular health search engines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题