论文标题

Petrogan:一种基于GAN的新型方法,用于生成逼真的,无标签的岩石学数据集

PetroGAN: A novel GAN-based approach to generate realistic, label-free petrographic datasets

论文作者

Ferreira, I., Ochoa, L., Koeshidayatullah, A.

论文摘要

深度学习体系结构丰富了地球科学中的数据分析,并补充了传统的地质问题方法。尽管地球科学中的深度学习应用显示出令人鼓舞的迹象,但实际潜力仍未开发。这主要是因为地质数据集,尤其是岩石学,有限,耗时且获得昂贵,需要深入的知识才能提供高质量的标签数据集。我们通过开发基于生成对抗网络(GAN)的新型深度学习框架来解决这些问题,以创建第一个现实的合成岩石学数据集。选择了stylegan2架构以允许对统计和审美特征的强大复制,并改善岩石学数据的内部差异。训练数据集由平面和交叉偏振光中的岩石薄部分的10070张图像组成。该算法训练了264 GPU小时,并达到了岩石学图像的最先进的Fréchet成立距离(FID)得分为12.49。我们进一步观察到FID值随岩性类型和图像分辨率而变化。我们的调查确定,主题专家发现,生成的图像与真实图像没有区别。这项研究强调,GAN是一种生成现实的合成数据,尝试潜在空间以及作为自我标志的未来工具的强大方法,从而减少了创建地质数据集的努力。

Deep learning architectures have enriched data analytics in the geosciences, complementing traditional approaches to geological problems. Although deep learning applications in geosciences show encouraging signs, the actual potential remains untapped. This is primarily because geological datasets, particularly petrography, are limited, time-consuming, and expensive to obtain, requiring in-depth knowledge to provide a high-quality labeled dataset. We approached these issues by developing a novel deep learning framework based on generative adversarial networks (GANs) to create the first realistic synthetic petrographic dataset. The StyleGAN2 architecture is selected to allow robust replication of statistical and esthetical characteristics, and improving the internal variance of petrographic data. The training dataset consists of 10070 images of rock thin sections both in plane- and cross-polarized light. The algorithm trained for 264 GPU hours and reached a state-of-the-art Fréchet Inception Distance (FID) score of 12.49 for petrographic images. We further observed the FID values vary with lithology type and image resolution. Our survey established that subject matter experts found the generated images were indistinguishable from real images. This study highlights that GANs are a powerful method for generating realistic synthetic data, experimenting with the latent space, and as a future tool for self-labelling, reducing the effort of creating geological datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源