协方差损失的最佳最小化

论文标题

协方差损失的最佳最小化

Optimal minimization of the covariance loss

论文作者

Jain, Vishesh, Sah, Ashwin, Sawhney, Mehtaab

论文摘要

令$ x $为$ \ mathbb {r}^{m} $中的随机向量，这样几乎可以肯定地肯定是$ \ | x \ | x \ | _ {2} \ le 1 $。对于每个$ k \ ge 3 $，我们都会证明存在一个由$ \ mathbb {r}^{m}^{m} $ k $ sets生成的Sigma代数$ \ Mathcal {f} $ \ [\ | \ operatorName {cov}（x） - \ operatatorName {cov}（\ mathbb {e} [x \ mid \ mathcal {f}]） \ | _ {\ Mathrm {f}} \ sillsim \ frac {1} {\ sqrt {\ sqrt {\ log {k}}}。我们的证明提供了一种有效的算法，用于构建$ \ Mathcal {f} $，并导致$ k $ - 匿名或差异私有合成数据的提高准确性保证。我们还建立了以上问题的联系，即在统计物理学中最小化协方差损失与固定引理，在重要情况下，当$ x \ in \ x \ in \ {\ pm {\}^m/\ sqrt {m} $几乎肯定地提供了替代（更简单）的算法证明。

Let $X$ be a random vector valued in $\mathbb{R}^{m}$ such that $\|X\|_{2} \le 1$ almost surely. For every $k\ge 3$, we show that there exists a sigma algebra $\mathcal{F}$ generated by a partition of $\mathbb{R}^{m}$ into $k$ sets such that \[\|\operatorname{Cov}(X) - \operatorname{Cov}(\mathbb{E}[X\mid\mathcal{F}]) \|_{\mathrm{F}} \lesssim \frac{1}{\sqrt{\log{k}}}.\] This is optimal up to the implicit constant and improves on a previous bound due to Boedihardjo, Strohmer, and Vershynin. Our proof provides an efficient algorithm for constructing $\mathcal{F}$ and leads to improved accuracy guarantees for $k$-anonymous or differentially private synthetic data. We also establish a connection between the above problem of minimizing the covariance loss and the pinning lemma from statistical physics, providing an alternate (and much simpler) algorithmic proof in the important case when $X \in \{\pm 1\}^m/\sqrt{m}$ almost surely.

下载PDF全文

下载文献需遵守相关版权规定

论文标题