论文标题

沃里克电子显微镜数据集

Warwick Electron Microscopy Datasets

论文作者

Ede, Jeffrey M.

论文摘要

大型,仔细分区的数据集对于训练神经网络并标准化性能基准是必不可少的。结果,我们设置了新的存储库,以使我们的电子显微镜数据集可用于更广泛的社区。有三个主要数据集,其中包含19769年扫描传输电子显微照片,17266传输电子显微照片以及98340模拟的出口波函数,以及每个数据集的多个变体用于不同的应用程序。为了可视化图像数据集,我们训练了变异自动编码器,将数据编码为64维多变量正常分布,我们通过T分配的随机邻居嵌入将其划分为二维。此外,我们通过引入编码归一化和正则化,添加图像梯度损失并扩展了T-分布的随机邻居嵌入以说明编码的标准偏差来改进数据集可视化的可视化数据集可视化。我们的数据集,源代码,预处理的模型和交互式可视化可以在https://github.com/jeffrey-ede/datasets上公开获得。

Large, carefully partitioned datasets are essential to train neural networks and standardize performance benchmarks. As a result, we have set up new repositories to make our electron microscopy datasets available to the wider community. There are three main datasets containing 19769 scanning transmission electron micrographs, 17266 transmission electron micrographs, and 98340 simulated exit wavefunctions, and multiple variants of each dataset for different applications. To visualize image datasets, we trained variational autoencoders to encode data as 64-dimensional multivariate normal distributions, which we cluster in two dimensions by t-distributed stochastic neighbor embedding. In addition, we have improved dataset visualization with variational autoencoders by introducing encoding normalization and regularization, adding an image gradient loss, and extending t-distributed stochastic neighbor embedding to account for encoded standard deviations. Our datasets, source code, pretrained models, and interactive visualizations are openly available at https://github.com/Jeffrey-Ede/datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源