论文标题
从不受信任的批次学习结构化分布:更快,更简单
Learning Structured Distributions From Untrusted Batches: Faster and Simpler
论文作者
论文摘要
我们从Qiao和Valiant [QV17]引入的不信任批处理中学习学习问题。最近,Ja那教和Orlitsky [JO19]基于剪切 - 基于剪切状态提供了一种简单的半决赛编程方法,该方法基本上在多项式时间内实现了理论上最佳的最佳误差。同时,Chen等人。 [CLM19]认为假定$μ$结构化的问题的变体,例如log-concave,单调危险率,$ t $ - 模态等。在这种情况下,在$ n $中的样本复杂性sublinear中可能会达到相同的错误,并且它们使用haar wavelets展示了用于这样做的准多发时间算法。 在本文中,我们找到了一种吸引人的方法,可以综合[JO19]和[CLM19]的技术,以提供两全其美的最佳:一种在多项式时间内运行的算法,并可以利用基本分布中的结构以实现均方根性样本的复杂性。在此过程中,我们通过避免了对SDP圆形的需求并通过软滤镜的镜头对其进行更直接的解释来简化[JO19]的方法,这是一种在高维鲁棒估计中的最新技术。我们验证算法在初步实验评估中的实用性。
We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cut-norm that achieves essentially information-theoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $μ$ is assumed to be structured, e.g. log-concave, monotone hazard rate, $t$-modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time algorithm for doing so using Haar wavelets. In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in high-dimensional robust estimation. We validate the usefulness of our algorithms in preliminary experimental evaluations.