潜在的图表表示安全评估的批判性观点

论文标题

潜在的图表表示安全评估的批判性观点

Latent Graph Representations for Critical View of Safety Assessment

论文作者

Murali, Aditya, Alapatt, Deepak, Mascagni, Pietro, Vardazaryan, Armine, Garcia, Alain, Okamoto, Nariaki, Mutter, Didier, Padoy, Nicolas

论文摘要

评估腹腔镜胆囊切除术的安全性的批判性观点需要准确鉴定和定位关键的解剖结构，对它们之间的几何关系相互推理，并确定其暴露质量。先前的工作通过将语义分割作为中间步骤（使用预测的分割掩码）来预测CVS，从而将语义分割作为中间步骤接近此任务。尽管这些方法是有效的，但它们依赖于极为昂贵的基础真相分割注释，并且当预测的分割不正确时倾向于失败，从而限制了概括。在这项工作中，我们提出了一种用于CVS预测的方法，其中我们首先使用分离的潜在场景图表示外科手术图像，然后使用图神经网络处理此表示。我们的图形表示明确编码语义信息 - 对象位置，类信息，几何关系 - 以改善解剖学驱动的推理以及视觉特征，以保持可怜性，从而为语义错误提供了鲁棒性。最后，为了解决注释成本，我们建议仅使用边界框注释来训练我们的方法，并结合辅助图像重建目标，以学习细粒对象边界。我们表明，我们的方法不仅在接受边界框注释训练时胜过几种基线方法，而且在接受分割面具训练时，也有效地缩放了缩放，从而保持最先进的性能。

Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题