论文标题
量子信息科学中以数据为中心的机器学习
Data-Centric Machine Learning in Quantum Information Science
论文作者
论文摘要
我们提出了一系列以数据为中心的启发式方法,以改善机器学习系统的性能,以应用于量子信息科学中的问题。特别是,我们考虑训练集的系统工程如何显着提高用于量子状态重建的预训练的神经网络的准确性,而无需更改潜在的体系结构。我们发现,工程师训练集并不总是最佳的,可以与目标方案的预期分布完全匹配,相反,可以通过偏向训练设置的混合比目标进一步提高性能。这是由于描述不同纯度状态所需的自由变量数量的异质性,因此,当固定尺寸的训练集集中在固定尺寸的训练集上时,网络的总体精度会提高,而自由变量最少。为了进一步清楚,我们还包括一个“玩具模型”演示,即如何无意间输入用于培训的合成数据集,这些相关性培训的系统的性能如何极大地降低,以及如何将相对较少的对策的包含在内,如何有效地解决此类问题。
We propose a series of data-centric heuristics for improving the performance of machine learning systems when applied to problems in quantum information science. In particular, we consider how systematic engineering of training sets can significantly enhance the accuracy of pre-trained neural networks used for quantum state reconstruction without altering the underlying architecture. We find that it is not always optimal to engineer training sets to exactly match the expected distribution of a target scenario, and instead, performance can be further improved by biasing the training set to be slightly more mixed than the target. This is due to the heterogeneity in the number of free variables required to describe states of different purity, and as a result, overall accuracy of the network improves when training sets of a fixed size focus on states with the least constrained free variables. For further clarity, we also include a "toy model" demonstration of how spurious correlations can inadvertently enter synthetic data sets used for training, how the performance of systems trained with these correlations can degrade dramatically, and how the inclusion of even relatively few counterexamples can effectively remedy such problems.