查看整个患者：使用多标签医学文本分类技术来增强医疗法规的预测

论文标题

查看整个患者：使用多标签医学文本分类技术来增强医疗法规的预测

Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

论文作者

Yogarajan, Vithya, Montiel, Jacob, Smith, Tony, Pfahringer, Bernhard

论文摘要

基于机器学习的多标签医学文本分类可用于增强对人体的理解并帮助患者护理。我们介绍了一项关于临床自然语言处理技术的广泛研究，以最大程度地提高代表文本的功能，以预测多种多种病毒的患者的医疗代码。我们介绍了18、50和155标签的多标签医学文本分类问题。我们将几种变体与嵌入，文本标记和预处理进行了比较。对于不平衡的数据，我们显示出现很少发生的标签，从嵌入中包含的其他功能中受益最大。我们还表明，使用与健康相关数据进行预训练的高维嵌入在多标签环境中具有显着改善，这与它们改善二进制分类性能的方式相似。这项研究的高维嵌入可供公众使用。

Machine learning-based multi-label medical text classifications can be used to enhance the understanding of the human body and aid the need for patient care. We present a broad study on clinical natural language processing techniques to maximise a feature representing text when predicting medical codes on patients with multi-morbidity. We present results of multi-label medical text classification problems with 18, 50 and 155 labels. We compare several variations to embeddings, text tagging, and pre-processing. For imbalanced data we show that labels which occur infrequently, benefit the most from additional features incorporated in embeddings. We also show that high dimensional embeddings pre-trained using health-related data present a significant improvement in a multi-label setting, similarly to the way they improve performance for binary classification. High dimensional embeddings from this research are made available for public use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题