多星网络上的异质探索探索策略

论文标题

多星网络上的异质探索探索策略

Heterogeneous Explore-Exploit Strategies on Multi-Star Networks

论文作者

Madhushani, Udari, Leonard, Naomi

论文摘要

我们研究了异质性在多代理探索探索探索决策中的好处，而代理的目标是最大程度地提高累积群体奖励。为此，我们研究了一类分布式随机匪徒问题，在这些问题中，代理通过多星网络进行通信，并在同一不确定环境中进行选择。通常，在多代理匪徒问题中，代理使用均匀的决策策略。但是，可以通过将异质性纳入代理人做出的选择来改善群体绩效，尤其是当网络图不规则时，即当代理具有不同数量的邻居时。我们使用多明星作为模型不规则网络图来设计和分析新的异质探索探索策略。关键思想是使中心代理人比使用均匀策略进行更多的探索，以便为外围机提供更多有用的数据。在这种情况下，所有代理商都以相同的概率向邻居广播其奖励价值和选择，我们提供了理论上的保证，即与均质策略相比，根据所提出的异质策略，团体绩效会改善。我们使用数值模拟来说明我们的结果并验证我们的理论界限。

We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e. when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题