论文标题

与GAN生成的数据集有关的不确定性在高能量物理学中

Uncertainties associated with GAN-generated datasets in high energy physics

论文作者

Matchev, Konstantin T., Roman, Alexander, Shyamsundar, Prasanth

论文摘要

最近,已经提出了对传统模拟的对撞机事件样本进行培训的生成对抗网络(GAN),作为一种以降低的计算成本生成较大模拟数据集的一种方式。在本文中,我们指出,gan产生的数据在统计上不能比训练的数据更好,并严格检查gan在各种情况下的适用性,包括a)用于替换整个蒙特卡洛管道或部分的部分,以及b)生产数据集以在高度敏感的分析或次级次要分析中生产使用数据集。我们使用信息理论演示,一个玩具示例以及正式陈述的形式提出了我们的论点,并确定了对撞机模拟中gan的一些潜在有效用途。

Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we point out that data generated by a GAN cannot statistically be better than the data it was trained on, and critically examine the applicability of GANs in various situations, including a) for replacing the entire Monte Carlo pipeline or parts of it, and b) to produce datasets for usage in highly sensitive analyses or sub-optimal ones. We present our arguments using information theoretic demonstrations, a toy example, as well as in the form of a formal statement, and identify some potential valid uses of GANs in collider simulations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源