论文标题
癌症病理学分类报告:一项大规模比较研究
Classification of cancer pathology reports: a large-scale comparative study
论文作者
论文摘要
我们报告了将最新的深度学习技术应用于ICD-O3地形和形态代码的自动且可解释的分配到自由文本癌症报告中的应用。我们在一个大型数据集上介绍了结果(超过80,000个标签和150 000个未标记的匿名报告,用意大利语编写,并从托斯卡纳的医院收集了十多年)和大量课程(134个形态学课程和61个特定类别)。我们从预测准确性和解释性方面比较了替代体系结构,并表明我们的最佳模型在地形站点分配中实现了90.3%的多类精度,而形态类型分配的多类精度为84.8%。我们发现,在这种情况下,层次模型不比平面模型好,并且元素的最大聚合器比站点分类的专注模型略好。此外,最大聚合器提供了一种解释分类过程的方法。
We report about the application of state-of-the-art deep learning techniques to the automatic and interpretable assignment of ICD-O3 topography and morphology codes to free-text cancer reports. We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes (134 morphological classes and 61 topographical classes). We compare alternative architectures in terms of prediction accuracy and interpretability and show that our best model achieves a multiclass accuracy of 90.3% on topography site assignment and 84.8% on morphology type assignment. We found that in this context hierarchical models are not better than flat models and that an element-wise maximum aggregator is slightly better than attentive models on site classification. Moreover, the maximum aggregator offers a way to interpret the classification process.