论文标题
使用辅助监督深度学习的机器人手术中的实时仪器分割
Real-Time Instrument Segmentation in Robotic Surgery using Auxiliary Supervised Deep Adversarial Learning
论文作者
论文摘要
机器人辅助手术是一项新兴技术,随着机器人和成像系统的发展,它已经快速增长。机器人臂的视觉,触觉和准确运动的创新使外科医生能够进行精确的微创手术。机器人仪器和组织的实时语义分割是机器人辅助手术的关键步骤。精确有效的手术场景分割不仅有助于识别和跟踪工具的识别和跟踪,而且还提供了有关正在使用的不同组织和工具的上下文信息。为此,我们开发了一个轻巧的级联卷积神经网络(CNN),以从商业机器人系统获得的高分辨率视频中分割出手术仪器。我们提出了一个多分辨率特征融合模块(MFF),以融合来自辅助和主分支的不同维度和通道的特征图。我们还介绍了一种结合辅助损失和对抗性损失的新型方法,以使分割模型正规化。辅助损失有助于模型学习低分辨率的特征,而对抗性损失通过学习高阶结构信息来改善细分预测。该模型还包括一个轻型空间金字塔池(SPP)单元,以在中间阶段汇总丰富的上下文信息。我们表明,在高分辨率视频的预测准确性和细分时间内,我们的模型超过了手术仪器像素细分的现有算法。
Robot-assisted surgery is an emerging technology which has undergone rapid growth with the development of robotics and imaging systems. Innovations in vision, haptics and accurate movements of robot arms have enabled surgeons to perform precise minimally invasive surgeries. Real-time semantic segmentation of the robotic instruments and tissues is a crucial step in robot-assisted surgery. Accurate and efficient segmentation of the surgical scene not only aids in the identification and tracking of instruments but also provided contextual information about the different tissues and instruments being operated with. For this purpose, we have developed a light-weight cascaded convolutional neural network (CNN) to segment the surgical instruments from high-resolution videos obtained from a commercial robotic system. We propose a multi-resolution feature fusion module (MFF) to fuse the feature maps of different dimensions and channels from the auxiliary and main branch. We also introduce a novel way of combining auxiliary loss and adversarial loss to regularize the segmentation model. Auxiliary loss helps the model to learn low-resolution features, and adversarial loss improves the segmentation prediction by learning higher order structural information. The model also consists of a light-weight spatial pyramid pooling (SPP) unit to aggregate rich contextual information in the intermediate stage. We show that our model surpasses existing algorithms for pixel-wise segmentation of surgical instruments in both prediction accuracy and segmentation time of high-resolution videos.