论文标题

协方差损失的最佳最小化

Optimal minimization of the covariance loss

论文作者

Jain, Vishesh, Sah, Ashwin, Sawhney, Mehtaab

论文摘要

令$ x $为$ \ mathbb {r}^{m} $中的随机向量,这样几乎可以肯定地肯定是$ \ | x \ | x \ | _ {2} \ le 1 $。对于每个$ k \ ge 3 $,我们都会证明存在一个由$ \ mathbb {r}^{m}^{m} $ k $ sets生成的Sigma代数$ \ Mathcal {f} $ \ [\ | \ operatorName {cov}(x) - \ operatatorName {cov}(\ mathbb {e} [x \ mid \ mathcal {f}]) \ | _ {\ Mathrm {f}} \ sillsim \ frac {1} {\ sqrt {\ sqrt {\ log {k}}}。 我们的证明提供了一种有效的算法,用于构建$ \ Mathcal {f} $,并导致$ k $ - 匿名或差异私有合成数据的提高准确性保证。我们还建立了以上问题的联系,即在统计物理学中最小化协方差损失与固定引理,在重要情况下,当$ x \ in \ x \ in \ {\ pm {\}^m/\ sqrt {m} $几乎肯定地提供了替代(更简单)的算法证明。

Let $X$ be a random vector valued in $\mathbb{R}^{m}$ such that $\|X\|_{2} \le 1$ almost surely. For every $k\ge 3$, we show that there exists a sigma algebra $\mathcal{F}$ generated by a partition of $\mathbb{R}^{m}$ into $k$ sets such that \[\|\operatorname{Cov}(X) - \operatorname{Cov}(\mathbb{E}[X\mid\mathcal{F}]) \|_{\mathrm{F}} \lesssim \frac{1}{\sqrt{\log{k}}}.\] This is optimal up to the implicit constant and improves on a previous bound due to Boedihardjo, Strohmer, and Vershynin. Our proof provides an efficient algorithm for constructing $\mathcal{F}$ and leads to improved accuracy guarantees for $k$-anonymous or differentially private synthetic data. We also establish a connection between the above problem of minimizing the covariance loss and the pinning lemma from statistical physics, providing an alternate (and much simpler) algorithmic proof in the important case when $X \in \{\pm 1\}^m/\sqrt{m}$ almost surely.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源