论文标题
歧管两样本测试研究:与神经网络的积分概率度量
A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
论文作者
论文摘要
两样本测试是旨在确定两个观测值是否遵循相同分布的重要领域。我们建议基于积分概率度量(IPM)对低维歧管支持的高维样品进行两样本测试。我们表征了针对样品数量$ n $的数量和具有内在尺寸$ d $的歧管结构的属性。当给出地图集时,我们提出了两步测试,以确定常规分布之间的差异,该差异以$ n^{ - 1/\ max \ {d,2 \}} $以$ n^{ - 1/\ max \ {d,2 \}} $的顺序达到了类型的风险。当没有给出地图集时,我们提出了HölderIPM测试,该测试适用于$(s,β)$-Hölder密度,该数据分布,以$ n^{ - (s+β)/d} $的顺序达到II型风险。为了减轻评估Hölderipm的沉重计算负担,我们使用神经网络近似Hölder函数类。基于神经网络的近似理论,我们表明神经网络IPM测试的风险为$ n^{ - (s+β)/d} $,该风险与HölderIPM检验的类型II风险相同。我们提出的测试适应低维几何结构,因为它们的性能至关重要地取决于固有维度而不是数据维度。
Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $n^{-1/\max\{d,2\}}$. When an atlas is not given, we propose Hölder IPM test that applies for data distributions with $(s,β)$-Hölder densities, which achieves the type-II risk in the order of $n^{-(s+β)/d}$. To mitigate the heavy computation burden of evaluating the Hölder IPM, we approximate the Hölder function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+β)/d}$, which is in the same order of the type-II risk as the Hölder IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.