论文标题
在基于图像的医学报告一代中检查最先进的性能和NLP指标
Inspecting state of the art performance and NLP metrics in image-based medical report generation
论文作者
论文摘要
在过去的几年中,已经提出了一些深度学习架构,以解决将成像检查作为输入的书面报告的问题。大多数作品都使用标准自然语言处理(NLP)指标(例如Bleu,Rouge)评估生成的报告,并报告了重大进展。在本文中,我们通过将艺术状态(SOTA)模型与弱基线进行比较来对比。我们表明,在大多数传统NLP指标上,简单甚至天真的方法都接近SOTA性能。我们得出的结论是,应进一步研究此任务中的评估方法,以正确测量临床准确性,理想情况下涉及医生为此做出贡献。
Several deep learning architectures have been proposed over the last years to deal with the problem of generating a written report given an imaging exam as input. Most works evaluate the generated reports using standard Natural Language Processing (NLP) metrics (e.g. BLEU, ROUGE), reporting significant progress. In this article, we contrast this progress by comparing state of the art (SOTA) models against weak baselines. We show that simple and even naive approaches yield near SOTA performance on most traditional NLP metrics. We conclude that evaluation methods in this task should be further studied towards correctly measuring clinical accuracy, ideally involving physicians to contribute to this end.