论文标题
CNN基于COVID-19诊断的本地视觉变压器
CNN-based Local Vision Transformer for COVID-19 Diagnosis
论文作者
论文摘要
深度学习技术可以用作一种辅助技术,以帮助医生快速准确地识别Covid-19的感染。最近,由于其全球接受场,视觉变压器(VIT)显示出巨大的图像分类潜力。但是,由于缺乏CNN固有的电感偏见,基于VIT的结构会导致模型训练的特征丰富性和难度。在本文中,我们提出了一个名为Covid-19(COVT)的称为Transformer的新结构,以提高基于VIT的小型Covid-19数据集上的基于VIT的架构的性能。它使用CNN作为功能提取器来有效提取本地结构信息,并将平均汇总到VIT的多层感知(MLP)模块以获取全局信息。实验显示了我们方法对两个COVID-19数据集和Imagenet数据集的有效性。
Deep learning technology can be used as an assistive technology to help doctors quickly and accurately identify COVID-19 infections. Recently, Vision Transformer (ViT) has shown great potential towards image classification due to its global receptive field. However, due to the lack of inductive biases inherent to CNNs, the ViT-based structure leads to limited feature richness and difficulty in model training. In this paper, we propose a new structure called Transformer for COVID-19 (COVT) to improve the performance of ViT-based architectures on small COVID-19 datasets. It uses CNN as a feature extractor to effectively extract local structural information, and introduces average pooling to ViT's Multilayer Perception(MLP) module for global information. Experiments show the effectiveness of our method on the two COVID-19 datasets and the ImageNet dataset.