预测聚合的样本复杂性

论文标题

预测聚合的样本复杂性

Sample Complexity of Forecast Aggregation

论文作者

Lin, Tao, Chen, Yiling

论文摘要

我们考虑了贝叶斯的预测汇总模型，在该模型中，$ n $专家在观察了关于未知二进制事件的私人信号后，向校长报告了他们对事件的后验信念，然后将报告汇总为事件的单个预测。专家的信号和事件的结果遵循校长未知的联合分配，但校长可以访问I.I.D.来自分布的“样本”，每个样本都是专家报告的元组（不是信号）和事件的实现。使用这些样品，主要目的是找到一个$ \ varepsilon $ - 符合最佳聚合器，在该聚合器中，根据聚合预测与事件实现之间的预期平方距离来测量最佳性。我们表明，对于任意离散分布，此问题的样本复杂性至少为$ \tildeΩ（m^{n-2} / \ varepsilon）$，其中$ m $是每个专家信号空间的大小。该样本复杂性在专家$ n $的数量中成倍增长。但是，如果专家的信号是独立的，以实现事件的实现为条件，那么样本复杂性将大大降低到$ \ tilde o（1 / \ varepsilon^2）$，这不取决于$ n $。我们的结果可以推广到非二进制事件。我们结果的证明使用了分配学习问题的减少，并揭示了一个事实，即预测聚合几乎与分配学习一样困难。

We consider a Bayesian forecast aggregation model where $n$ experts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal, who then aggregates the reports into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but the principal has access to i.i.d. "samples" from the distribution, where each sample is a tuple of the experts' reports (not signals) and the realization of the event. Using these samples, the principal aims to find an $\varepsilon$-approximately optimal aggregator, where optimality is measured in terms of the expected squared distance between the aggregated prediction and the realization of the event. We show that the sample complexity of this problem is at least $\tilde Ω(m^{n-2} / \varepsilon)$ for arbitrary discrete distributions, where $m$ is the size of each expert's signal space. This sample complexity grows exponentially in the number of experts $n$. But, if the experts' signals are independent conditioned on the realization of the event, then the sample complexity is significantly reduced, to $\tilde O(1 / \varepsilon^2)$, which does not depend on $n$. Our results can be generalized to non-binary events. The proof of our results uses a reduction from the distribution learning problem and reveals the fact that forecast aggregation is almost as difficult as distribution learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题