具有臂级资格控制的非参数上下文强盗，用于客户服务路由

论文标题

具有臂级资格控制的非参数上下文强盗，用于客户服务路由

A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing

论文作者

Wen, Ruofeng, Zeng, Wenjun, Liu, Yi

论文摘要

亚马逊客户服务每年为数百万客户联系提供实时支持。尽管Bot-Resolver有助于自动化一些流量，但我们仍然看到对人类代理商的需求很高，也称为主题专家（SME）。客户在不同域中的问题（返回策略，设备故障排除等）进行宣传。根据他们的培训，并非所有中小型企业都有资格处理所有联系人。与合格的中小型企业的路由联系是一个非平凡的问题，因为中小企业的域名资格受训练质量的影响，并且可以随着时间的推移而改变。为了在同时学习真正的资格状态的同时，我们建议使用非参数上下文的强盗算法（K-Boot）以及资格控制（EC）算法来制定路由问题。 K-Boot模型以$ K $ -NN选择的类似样本和Bootstrap Thompson采样进行探索，并以类似的样本进行奖励。 EC通过最初符合系统的资格过滤武器（SME），并动态验证该信息的可靠性。提出的K-boot是一种通用匪徒算法，EC适用于其他匪徒。我们的仿真研究表明，K-Boot在最先进的Bandit模型上执行k-boot，并且当存在随机弹性信号时，EC会提高K-Boot性能。

Amazon Customer Service provides real-time support for millions of customer contacts every year. While bot-resolver helps automate some traffic, we still see high demand for human agents, also called subject matter experts (SMEs). Customers outreach with questions in different domains (return policy, device troubleshooting, etc.). Depending on their training, not all SMEs are eligible to handle all contacts. Routing contacts to eligible SMEs turns out to be a non-trivial problem because SMEs' domain eligibility is subject to training quality and can change over time. To optimally recommend SMEs while simultaneously learning the true eligibility status, we propose to formulate the routing problem with a nonparametric contextual bandit algorithm (K-Boot) plus an eligibility control (EC) algorithm. K-Boot models reward with a kernel smoother on similar past samples selected by $k$-NN, and Bootstrap Thompson Sampling for exploration. EC filters arms (SMEs) by the initially system-claimed eligibility and dynamically validates the reliability of this information. The proposed K-Boot is a general bandit algorithm, and EC is applicable to other bandits. Our simulation studies show that K-Boot performs on par with state-of-the-art Bandit models, and EC boosts K-Boot performance when stochastic eligibility signal exists.

下载PDF全文

下载文献需遵守相关版权规定

论文标题