从不受信任的批次学习结构化分布：更快，更简单

论文标题

从不受信任的批次学习结构化分布：更快，更简单

Learning Structured Distributions From Untrusted Batches: Faster and Simpler

论文作者

Chen, Sitan, Li, Jerry, Moitra, Ankur

论文摘要

我们从Qiao和Valiant [QV17]引入的不信任批处理中学习学习问题。最近，Ja那教和Orlitsky [JO19]基于剪切 - 基于剪切状态提供了一种简单的半决赛编程方法，该方法基本上在多项式时间内实现了理论上最佳的最佳误差。同时，Chen等人。 [CLM19]认为假定$μ$结构化的问题的变体，例如log-concave，单调危险率，$ t $ - 模态等。在这种情况下，在$ n $中的样本复杂性sublinear中可能会达到相同的错误，并且它们使用haar wavelets展示了用于这样做的准多发时间算法。在本文中，我们找到了一种吸引人的方法，可以综合[JO19]和[CLM19]的技术，以提供两全其美的最佳：一种在多项式时间内运行的算法，并可以利用基本分布中的结构以实现均方根性样本的复杂性。在此过程中，我们通过避免了对SDP圆形的需求并通过软滤镜的镜头对其进行更直接的解释来简化[JO19]的方法，这是一种在高维鲁棒估计中的最新技术。我们验证算法在初步实验评估中的实用性。

We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cut-norm that achieves essentially information-theoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $μ$ is assumed to be structured, e.g. log-concave, monotone hazard rate, $t$-modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time algorithm for doing so using Haar wavelets. In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in high-dimensional robust estimation. We validate the usefulness of our algorithms in preliminary experimental evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题