生物医学概念相关性 - 一个基于EHR的大型基准

论文标题

生物医学概念相关性 - 一个基于EHR的大型基准

Biomedical Concept Relatedness -- A large EHR-based benchmark

论文作者

Schulz, Claudia, Levy-Kramer, Josh, Van Assel, Camille, Kepes, Miklos, Hammerla, Nils

论文摘要

AI在医疗保健中有希望的应用是从电子健康记录（EHRS）中检索信息，例如帮助临床医生找到咨询的相关信息或招募合适的患者进行研究。这需要搜索功能远远超出了简单的字符串匹配，包括与相关的概念（诊断，症状，药物等）的检索。通过预测具有已知相关性分数的概念的相关性，可以测试AI方法对此类应用的适用性。但是，所有现有的生物医学概念相关性数据集都很小，并且由手工挑选的概念对组成。我们开放一个新颖的概念相关性基准克服了这些问题：它比现有数据集大六倍，并且基于EHR中的共发生选择了概念对，从而确保了它们对应用利益的相关性。我们对新数据集进行了深入的分析，并将其与现有数据集进行比较，强调它不仅更大，而且还根据所包括的概念类型来补充现有数据集。最初使用最先进的嵌入方法的实验表明，我们的数据集是测试概念相关性模型的挑战性新基准。

A promising application of AI to healthcare is the retrieval of information from electronic health records (EHRs), e.g. to aid clinicians in finding relevant information for a consultation or to recruit suitable patients for a study. This requires search capabilities far beyond simple string matching, including the retrieval of concepts (diagnoses, symptoms, medications, etc.) related to the one in question. The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores. However, all existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs. We open-source a novel concept relatedness benchmark overcoming these issues: it is six times larger than existing datasets and concept pairs are chosen based on co-occurrence in EHRs, ensuring their relevance for the application of interest. We present an in-depth analysis of our new dataset and compare it to existing ones, highlighting that it is not only larger but also complements existing datasets in terms of the types of concepts included. Initial experiments with state-of-the-art embedding methods show that our dataset is a challenging new benchmark for testing concept relatedness models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题