论文标题
放射学报告生成的跨模式存储网络
Cross-modal Memory Networks for Radiology Report Generation
论文作者
论文摘要
医学成像在医学诊断的临床实践中起着重要作用,在医学诊断的临床实践中,图像的文本报告对于理解它们和促进后来的治疗至关重要。通过自动生成报告,帮助减轻放射科医生的负担并显着促进临床自动化是有益的,这在将人工智能应用于医疗领域时已经引起了很多关注。先前的研究主要遵循编码器范式范式,并关注文本生成方面,很少有研究考虑跨模式映射的重要性,并明确利用此类映射以促进放射学报告的生成。在本文中,我们提出了一个跨模式内存网络(CMN),以增强放射学报告生成的编码器框架,其中共享存储器旨在记录图像和文本之间的对齐,以促进跨模态的相互作用和生成。实验结果说明了我们提出的模型的有效性,其中在两个广泛使用的基准数据集(即IU X射线和Mimic-CXR)上实现了最先进的性能。进一步的分析还证明,我们的模型能够从放射学图像和文本中更好地对齐信息,从而帮助根据临床指标生成更准确的报告。
Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments. By generating the reports automatically, it is beneficial to help lighten the burden of radiologists and significantly promote clinical automation, which already attracts much attention in applying artificial intelligence to medical domain. Previous studies mainly follow the encoder-decoder paradigm and focus on the aspect of text generation, with few studies considering the importance of cross-modal mappings and explicitly exploit such mappings to facilitate radiology report generation. In this paper, we propose a cross-modal memory networks (CMN) to enhance the encoder-decoder framework for radiology report generation, where a shared memory is designed to record the alignment between images and texts so as to facilitate the interaction and generation across modalities. Experimental results illustrate the effectiveness of our proposed model, where state-of-the-art performance is achieved on two widely used benchmark datasets, i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.