射线照相报告生成的医学语义辅助变压器

论文标题

射线照相报告生成的医学语义辅助变压器

A Medical Semantic-Assisted Transformer for Radiographic Report Generation

论文作者

Wang, Zhanyu, Tang, Mingkang, Wang, Lei, Li, Xiu, Zhou, Luping

论文摘要

自动放射学报告生成是一项具有挑战性的跨域任务，旨在自动生成准确和语义辅助报告以描述医学图像。尽管该领域最近取得了进展，但至少在以下方面仍然存在许多挑战。首先，放射线图像彼此非常相似，因此很难像许多现有方法一样，使用CNN作为视觉特征提取器捕获细粒度的视觉差异。此外，语义信息已被广泛应用于提高发电任务的性能（例如图像字幕），但是现有方法通常无法提供有效的医学语义功能。为了解决这些问题，在本文中，我们提出了一个记忆启动的稀疏注意区块，利用双线性池来捕获输入细颗粒图像特征之间的高阶相互作用，同时产生稀疏的注意力。此外，我们介绍了一个新颖的医学概念生成网络（MCGN），以预测细粒的语义概念，并将其纳入报告生成过程中。我们提出的方法在最近发布的最大基准MIMIC-CXR上显示出有希望的性能。它的表现优于图像字幕和医疗报告生成中的多种最新方法。

Automated radiographic report generation is a challenging cross-domain task that aims to automatically generate accurate and semantic-coherence reports to describe medical images. Despite the recent progress in this field, there are still many challenges at least in the following aspects. First, radiographic images are very similar to each other, and thus it is difficult to capture the fine-grained visual differences using CNN as the visual feature extractor like many existing methods. Further, semantic information has been widely applied to boost the performance of generation tasks (e.g. image captioning), but existing methods often fail to provide effective medical semantic features. Toward solving those problems, in this paper, we propose a memory-augmented sparse attention block utilizing bilinear pooling to capture the higher-order interactions between the input fine-grained image features while producing sparse attention. Moreover, we introduce a novel Medical Concepts Generation Network (MCGN) to predict fine-grained semantic concepts and incorporate them into the report generation process as guidance. Our proposed method shows promising performance on the recently released largest benchmark MIMIC-CXR. It outperforms multiple state-of-the-art methods in image captioning and medical report generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题