论文标题
使用张量分解边界丢失卷积镶嵌和上下文意识到的跳过连接的高保真互动视频分割
High Fidelity Interactive Video Segmentation Using Tensor Decomposition Boundary Loss Convolutional Tessellations and Context Aware Skip Connections
论文作者
论文摘要
我们使用具有上下文感知的跳过连接的卷积网络以及压缩的,超柱图像功能与卷积镶嵌过程相结合的卷积网络,为交互式视频分割任务提供了高保真深度学习算法(Hyperseg)。为了维持高输出保真度,我们的模型在不使用下采样或汇总过程的情况下进行了至关重要的处理并赋予所有图像特征。我们主要通过两种方式在模型中有效地维持这种一致的高级保真度:(1)我们使用统计原理的张量分解程序来调节超柱特征的数量,并且(2)我们使用A卷积螺旋式镶嵌技术在其天然分辨率中渲染这些特征。为了改善像素级分割结果,我们引入了边界损失函数。为了改善视频数据中的时间连贯性,我们在模型中包括时间图像信息。通过实验,我们证明了使用高分辨率视频数据对基线模型的改进准确性。我们还介绍了一个基准视频细分数据集VFX分割数据集,其中包含超过27,046个高分辨率视频框架,包括Greenscreen和各种合成的场景,并具有相应的手工制作的像素级段。我们的工作提出了通过高分辨率数据改进最先进的忠诚度状态的扩展,可以在包括VFX管道和医学成像学科在内的广泛应用领域中使用。
We provide a high fidelity deep learning algorithm (HyperSeg) for interactive video segmentation tasks using a convolutional network with context-aware skip connections, and compressed, hypercolumn image features combined with a convolutional tessellation procedure. In order to maintain high output fidelity, our model crucially processes and renders all image features in high resolution, without utilizing downsampling or pooling procedures. We maintain this consistent, high grade fidelity efficiently in our model chiefly through two means: (1) We use a statistically-principled tensor decomposition procedure to modulate the number of hypercolumn features and (2) We render these features in their native resolution using a convolutional tessellation technique. For improved pixel level segmentation results, we introduce a boundary loss function; for improved temporal coherence in video data, we include temporal image information in our model. Through experiments, we demonstrate the improved accuracy of our model against baseline models for interactive segmentation tasks using high resolution video data. We also introduce a benchmark video segmentation dataset, the VFX Segmentation Dataset, which contains over 27,046 high resolution video frames, including greenscreen and various composited scenes with corresponding, hand crafted, pixel level segmentations. Our work presents an extension to improvement to state of the art segmentation fidelity with high resolution data and can be used across a broad range of application domains, including VFX pipelines and medical imaging disciplines.