基于深度学习的部分遮挡的苹果树的语义细分

论文标题

基于深度学习的部分遮挡的苹果树的语义细分

Semantic Segmentation for Partially Occluded Apple Trees Based on Deep Learning

论文作者

Chen, Zijue, Ting, David, Newbury, Rhys, Chen, Chao

论文摘要

果树修剪和果实稀疏需要一个强大的视觉系统，可以提供果树及其树枝的高分辨率分割。但是，最近的作品仅考虑休眠季节，该季节在分支上有最小的阻塞或拟合多项式曲线以重建分支形状，因此失去了有关分支厚度的信息。在这项工作中，我们应用了两个最先进的监督学习模型U-NET和DEEPLABV3，以及一个有条件的生成对抗网络Pix2pix（带有和没有鉴别器）来部分遮挡2D-open-V苹果树。使用二进制精度，平均IOU，边界F1得分和遮挡的分支召回来评估模型的性能。 DeepLabv3以二进制精度（平均值IOU和边界F1得分）优于其他模型，但在闭塞分支召回中被Pix2Pix（无歧视器）和U-NET超过。我们定义了两个难度索引来量化任务的难度：（1）遮挡难度指数和（2）深度难度索引。我们通过分支召回和遮挡的分支召回来分析两个难度指数中最差的10张图像。 U-NET在当前指标中的其他两个模型都优于其他两个模型。另一方面，pix2pix（无歧视器）提供了有关分支路径的更多信息，分支路径不反映指标。这突出了需要更具体的指标来恢复封闭信息。此外，这显示了图像转移网络对闭塞背后的幻觉的有用性。需要未来的工作来进一步增强模型，以从遮挡中恢复更多信息，以便将该技术应用于商业环境中的农业任务。

Fruit tree pruning and fruit thinning require a powerful vision system that can provide high resolution segmentation of the fruit trees and their branches. However, recent works only consider the dormant season, where there are minimal occlusions on the branches or fit a polynomial curve to reconstruct branch shape and hence, losing information about branch thickness. In this work, we apply two state-of-the-art supervised learning models U-Net and DeepLabv3, and a conditional Generative Adversarial Network Pix2Pix (with and without the discriminator) to segment partially occluded 2D-open-V apple trees. Binary accuracy, Mean IoU, Boundary F1 score and Occluded branch recall were used to evaluate the performances of the models. DeepLabv3 outperforms the other models at Binary accuracy, Mean IoU and Boundary F1 score, but is surpassed by Pix2Pix (without discriminator) and U-Net in Occluded branch recall. We define two difficulty indices to quantify the difficulty of the task: (1) Occlusion Difficulty Index and (2) Depth Difficulty Index. We analyze the worst 10 images in both difficulty indices by means of Branch Recall and Occluded Branch Recall. U-Net outperforms the other two models in the current metrics. On the other hand, Pix2Pix (without discriminator) provides more information on branch paths, which are not reflected by the metrics. This highlights the need for more specific metrics on recovering occluded information. Furthermore, this shows the usefulness of image-transfer networks for hallucination behind occlusions. Future work is required to further enhance the models to recover more information from occlusions such that this technology can be applied to automating agricultural tasks in a commercial environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题