样本驱动的最佳停止：从秘书问题到I.I.D.先知不平等

论文标题

样本驱动的最佳停止：从秘书问题到I.I.D.先知不平等

Sample-driven optimal stopping: From the secretary problem to the i.i.d. prophet inequality

论文作者

Correa, José, Cristi, Andrés, Epstein, Boris, Soto, José

论文摘要

我们采用一种统一的方法来进行单个选择的最佳停止问题，并随机到达顺序和项目独立采样。在我们考虑的问题中，决策者（DM）最初可以独立于$ n $ p $进行$ n $项目中的每一个，并可以观察这些采样项目的相对排名。然后，DM以在线方式面对其余项目，观察所有显示项目的相对排名。在扫描序列的同时，DM可以做出不可撤销的停止/继续决策，并且她的奖励是停止使用等级$ i $ $ $ y_i $的序列。 DM的目标是最大程度地提高她的奖励。我们首先研究DM已知$ y_i $的情况，然后转到这些值是对抗性的情况。对于以前的情况，我们编写了自然线性程序，该程序捕获了算法的性能，并达到其连续限制。我们证明了有关这种连续限制的结构性结果，这使我们能够将问题减少到一个相对简单的实际优化问题。我们确定最佳算法由一系列阈值$ t_1 \ le t_2 \ le \ cdots $给出，因此，如果看到具有当前排名$ i $ aft time $ t_i $的项目，则DM应该停止。此外，我们能够在该领域中恢复一些经典结果，例如秘书问题和最低排名问题。对于对抗情况，我们获得了一个具有附加随机优势约束的类似线性程序。使用相同的机械，我们能够以$ p $的所有值固定最佳竞争比率。值得注意的是，我们证明，随着$ p $的速度1，我们的保证将线性收敛至0.745，与I.I.D.〜先知不平等的趋势相匹配。 $ p = 1/2 $也很有趣，我们的约束评估为$ 0.671 $，这对最新情况有所改善。

We take a unifying approach to single selection optimal stopping problems with random arrival order and independent sampling of items. In the problem we consider, a decision maker (DM) initially gets to sample each of $N$ items independently with probability $p$, and can observe the relative rankings of these sampled items. Then, the DM faces the remaining items in an online fashion, observing the relative rankings of all revealed items. While scanning the sequence the DM makes irrevocable stop/continue decisions and her reward for stopping the sequence facing the item with rank $i$ is $Y_i$. The goal of the DM is to maximize her reward. We start by studying the case in which the values $Y_i$ are known to the DM, and then move to the case in which these values are adversarial. For the former case, we write the natural linear program that captures the performance of an algorithm, and take its continuous limit. We prove a structural result about this continuous limit, which allows us to reduce the problem to a relatively simple real optimization problem. We establish that the optimal algorithm is given by a sequence of thresholds $t_1\le t_2\le\cdots$ such that the DM should stop if seeing an item with current ranking $i$ after time $t_i$. Additionally we are able to recover several classic results in the area such as those for secretary problem and the minimum ranking problem. For the adversarial case, we obtain a similar linear program with an additional stochastic dominance constraint. Using the same machinery we are able to pin down the optimal competitive ratios for all values of $p$. Notably, we prove that as $p$ approaches 1, our guarantee converges linearly to 0.745, matching that of the i.i.d.~prophet inequality. Also interesting is the case $p=1/2$, where our bound evaluates to $0.671$, which improves upon the state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题