使用语义数据集的距离量化BERT诊断性跨医学专业的通用性

论文标题

使用语义数据集的距离量化BERT诊断性跨医学专业的通用性

Quantification of BERT Diagnosis Generalizability Across Medical Specialties Using Semantic Dataset Distance

论文作者

Khambete, Mihir P., Su, William, Garcia, Juan, Badgeley, Marcus A.

论文摘要

医疗保健中的深度学习模型可能无法概括来自看不见的语料库的数据。此外，不存在定量度量标准来说明现有模型将如何在新数据上执行。先前的研究表明，医学笔记的NLP模型在机构之间概括了，但忽略了其他级别的医疗保健组织。我们使用Mimic-III的EHR句子在医学专业之间测量了SCIBERT诊断情绪分类器的概括性。在内部测试集上训练的模型比混合或外部测试集（平均AUCS 0.92、0.87和0.83分别; p = 0.016）更好。当模型接受更多专业培训时，它们具有更好的测试性能（P <1E-4）。新语料库的模型性能与火车和测试句子内容之间的相似性直接相关（p <1e-4）。未来的研究应评估额外的泛化轴，以确保深度学习模型在机构，专业和实践中实现其预期目的。

Deep learning models in healthcare may fail to generalize on data from unseen corpora. Additionally, no quantitative metric exists to tell how existing models will perform on new data. Previous studies demonstrated that NLP models of medical notes generalize variably between institutions, but ignored other levels of healthcare organization. We measured SciBERT diagnosis sentiment classifier generalizability between medical specialties using EHR sentences from MIMIC-III. Models trained on one specialty performed better on internal test sets than mixed or external test sets (mean AUCs 0.92, 0.87, and 0.83, respectively; p = 0.016). When models are trained on more specialties, they have better test performances (p < 1e-4). Model performance on new corpora is directly correlated to the similarity between train and test sentence content (p < 1e-4). Future studies should assess additional axes of generalization to ensure deep learning models fulfil their intended purpose across institutions, specialties, and practices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题