论文标题
可扩展学习以弥合基于图像的植物表型中的物种差距
Scalable learning for bridging the species gap in image-based plant phenotyping
论文作者
论文摘要
应用深度学习的传统范式 - 收集,注释和训练数据 - 不适用于基于图像的植物表型,因为存在近40万种不同的植物物种。数据成本包括种植物理样本,成像和标记它们。模型性能受到每个植物物种域之间物种差距的影响,它是不可概括的,并且可能不会转移到看不见的植物物种。在本文中,我们研究了合成数据用于叶片实例分割的使用。当很少或没有带注释的真实数据可用时,我们使用Mask-RCNN研究了多个合成数据训练制度。我们还提出了UPGEN:一种通用植物发生器,用于弥合物种间隙。 UPGEN利用域随机化来生成广泛分布的数据样本,并模型随机生物学变异。我们的方法的表现优于标准实践,例如从公共植物数据中转移学习,分别对两种看不见的植物物种分别为26.6%和51.46%。我们通过参加CVPPP叶片细分挑战并设定了新的最先进的方法来基准UPGEN,在A1-4测试数据集中平均为88%。这项研究适用于使用合成数据自动化表型特征的测量。我们的合成数据集和预验证的模型可在https://csiro-robotics.github.io/upgen_webpage/上找到。
The traditional paradigm of applying deep learning -- collect, annotate and train on data -- is not applicable to image-based plant phenotyping as almost 400,000 different plant species exists. Data costs include growing physical samples, imaging and labelling them. Model performance is impacted by the species gap between the domain of each plant species, it is not generalisable and may not transfer to unseen plant species. In this paper, we investigate the use of synthetic data for leaf instance segmentation. We study multiple synthetic data training regimes using Mask-RCNN when few or no annotated real data is available. We also present UPGen: a Universal Plant Generator for bridging the species gap. UPGen leverages domain randomisation to produce widely distributed data samples and models stochastic biological variation. Our methods outperform standard practices, such as transfer learning from publicly available plant data, by 26.6% and 51.46% on two unseen plant species respectively. We benchmark UPGen by competing in the CVPPP Leaf Segmentation Challenge and set a new state-of-the-art, a mean of 88% across A1-4 test datasets. This study is applicable to use of synthetic data for automating the measurement of phenotypic traits. Our synthetic dataset and pretrained model are available at https://csiro-robotics.github.io/UPGen_Webpage/.