随时间干扰的自适应实验设计：最大似然方法

论文标题

随时间干扰的自适应实验设计：最大似然方法

Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach

论文作者

Glynn, Peter, Johari, Ramesh, Rasouli, Mohammad

论文摘要

假设一个在线平台希望比较乘车共享系统中的两种匹配算法，或在线零售网站中的两种不同的库存管理算法。标准随机对照试验通常不可行，因为目标是估计整个系统的政策绩效。取而代之的是，典型的电流实践涉及在两个策略之间动态交替以固定的时间长度交替，并比较每种策略的平均性能，以作为治疗效果的估计，在它们运行的时间间隔内进行了比较。但是，这种方法受到 *时间干扰 *：一种算法改变了第二算法所见的系统状态，对治疗效果的偏见估计值。此外，此类设计的简单非自适应性质意味着它们不是样品有效的。我们开发了一个基准理论模型，在其中研究了此环境的最佳实验设计。我们认为测试这两个政策是估计两个未知马尔可夫链（即策略）之间奖励稳态差异的问题。我们假设每个链的稳态奖励通过非参数最大可能性进行进行，并寻找有效的一致（即，渐近无偏见的）实验设计（即渐近）（即渐近最小值）。表征这种设计等同于马尔可夫决策问题，并具有最小的差异目标。此类问题通常不承认可进行的解决方案。值得注意的是，在我们的环境中，使用Poisson方程对马尔可夫链的经典Martingale分析进行了新的应用，我们通过简洁的凸优化问题来表征有效的设计。我们使用这种表征来提出一种一致，高效的在线实验设计，可适应两个马尔可夫链。

Suppose an online platform wants to compare a treatment and control policy, e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site. Standard randomized controlled trials are typically not feasible, since the goal is to estimate policy performance on the entire system. Instead, the typical current practice involves dynamically alternating between the two policies for fixed lengths of time, and comparing the average performance of each over the intervals in which they were run as an estimate of the treatment effect. However, this approach suffers from *temporal interference*: one algorithm alters the state of the system as seen by the second algorithm, biasing estimates of the treatment effect. Further, the simple non-adaptive nature of such designs implies they are not sample efficient. We develop a benchmark theoretical model in which to study optimal experimental design for this setting. We view testing the two policies as the problem of estimating the steady state difference in reward between two unknown Markov chains (i.e., policies). We assume estimation of the steady state reward for each chain proceeds via nonparametric maximum likelihood, and search for consistent (i.e., asymptotically unbiased) experimental designs that are efficient (i.e., asymptotically minimum variance). Characterizing such designs is equivalent to a Markov decision problem with a minimum variance objective; such problems generally do not admit tractable solutions. Remarkably, in our setting, using a novel application of classical martingale analysis of Markov chains via Poisson's equation, we characterize efficient designs via a succinct convex optimization problem. We use this characterization to propose a consistent, efficient online experimental design that adaptively samples the two Markov chains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题