论文标题
医学报告生成的辅助信号引导的知识编码器
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation
论文作者
论文摘要
除了自然图像字幕中面临的常见困难之外,医学报告的生成还需要该模型来描述具有精细元素和语义索引段的医学图像,该段落应满足医学常识和逻辑。以前的作品通常提取全局图像特征,并尝试生成类似于引用报告的段落;但是,这种方法有两个局限性。首先,放射科医生的主要兴趣区域通常位于全球图像的小区域中,这意味着图像的其余部分可以被视为训练程序中的无关噪声。其次,每个医学报告中使用了许多类似的句子来描述图像的正常区域,这会导致严重的数据偏见。这种偏差可能会教导模型定期生成这些不必要的句子。为了解决这些问题,我们提出了一个辅助信号引导的知识编码器(ASGK),以模仿放射科医生的工作模式。更详细地,ASGK整合了内部视觉特征融合和外部医学语言信息,以指导医学知识转移和学习。 ASGK的核心结构由医学图编码器和自然语言解码器组成,灵感来自先进的生成预训练(GPT)。在CX-CHR数据集和我们的COVID-19 CT报告数据集上进行的实验表明,我们提出的ASGK能够生成强大而准确的报告,并且在医学术语分类和段落生成量度方面均优于最先进的方法。
Beyond the common difficulties faced in the natural image captioning, medical report generation specifically requires the model to describe a medical image with a fine-grained and semantic-coherence paragraph that should satisfy both medical commonsense and logic. Previous works generally extract the global image features and attempt to generate a paragraph that is similar to referenced reports; however, this approach has two limitations. Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure. Secondly, there are many similar sentences used in each medical report to describe the normal regions of the image, which causes serious data bias. This deviation is likely to teach models to generate these inessential sentences on a regular basis. To address these problems, we propose an Auxiliary Signal-Guided Knowledge Encoder-Decoder (ASGK) to mimic radiologists' working patterns. In more detail, ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning. The core structure of ASGK consists of a medical graph encoder and a natural language decoder, inspired by advanced Generative Pre-Training (GPT). Experiments on the CX-CHR dataset and our COVID-19 CT Report dataset demonstrate that our proposed ASGK is able to generate a robust and accurate report, and moreover outperforms state-of-the-art methods on both medical terminology classification and paragraph generation metrics.