论文标题
温带鱼检测和分类:一种基于深度学习的方法
Temperate Fish Detection and Classification: a Deep Learning based Approach
论文作者
论文摘要
海洋生态中的广泛应用广泛使用水下摄像头。尽管如此,为了有效地处理生成的大量数据,我们需要开发可以自动检测并识别在膜上捕获的物种的工具。从自然环境中的视频和图像分类的鱼类由于照明和周围栖息地的差异,可能会具有挑战性。在本文中,我们提出了一种两步的深度学习方法,用于对温带鱼类的检测和分类而无需进行过滤。第一步是检测图像中的每条鱼,独立于物种和性别。为此,我们仅使用一次(YOLO)对象检测技术。在第二步中,我们采用了一个卷积神经网络(CNN),其挤压和兴奋(SE)架构用于在不进行过滤的情况下对图像中的每条鱼进行分类。我们应用转移学习来克服有限的温带鱼类训练样本并提高分类的准确性。这是通过通过公共数据集(Fish4Knowledge)对象检测模型和FISH分类器训练对象检测模型来完成的,因此,对象检测和分类器都通过感兴趣的温带鱼进行更新。从训练中获得的权重应用于先验后培训。我们的解决方案在预训练中实现了99.27%的最新精度。训练后准确性的百分比值良好; 83.68 \%和87.74 \%分别有和没有图像增强,表明该解决方案具有更广泛的数据集可行。
A wide range of applications in marine ecology extensively uses underwater cameras. Still, to efficiently process the vast amount of data generated, we need to develop tools that can automatically detect and recognize species captured on film. Classifying fish species from videos and images in natural environments can be challenging because of noise and variation in illumination and the surrounding habitat. In this paper, we propose a two-step deep learning approach for the detection and classification of temperate fishes without pre-filtering. The first step is to detect each single fish in an image, independent of species and sex. For this purpose, we employ the You Only Look Once (YOLO) object detection technique. In the second step, we adopt a Convolutional Neural Network (CNN) with the Squeeze-and-Excitation (SE) architecture for classifying each fish in the image without pre-filtering. We apply transfer learning to overcome the limited training samples of temperate fishes and to improve the accuracy of the classification. This is done by training the object detection model with ImageNet and the fish classifier via a public dataset (Fish4Knowledge), whereupon both the object detection and classifier are updated with temperate fishes of interest. The weights obtained from pre-training are applied to post-training as a priori. Our solution achieves the state-of-the-art accuracy of 99.27\% on the pre-training. The percentage values for accuracy on the post-training are good; 83.68\% and 87.74\% with and without image augmentation, respectively, indicating that the solution is viable with a more extensive dataset.