论文标题
多代理上下文匪徒问题基于决策市场的学习
Decision Market Based Learning For Multi-agent Contextual Bandit Problems
论文作者
论文摘要
信息通常以分布式和专有形式存储,并且拥有信息的代理通常是自我利益的,需要激励措施来揭示其信息。需要适当的机制来引发和汇总此类分布式信息进行决策。在本文中,我们使用仿真来调查决策市场作为多代理学习系统中的机制,以在上下文强盗问题中汇总分布式信息以进行决策。该系统利用严格正确的决策评分规则来评估代理商的概率报告的准确性,这使代理商可以学会共同解决上下文的强盗问题。我们的模拟表明,我们的带有分布式信息的多代理系统可以像接收所有信息的单个代理一样有效地培训。此外,我们使用系统来调查与不兼容的确定性决策评分规则的情况。我们观察到更复杂的动力学以操纵性行为的出现,这与现有的理论分析一致。
Information is often stored in a distributed and proprietary form, and agents who own information are often self-interested and require incentives to reveal their information. Suitable mechanisms are required to elicit and aggregate such distributed information for decision making. In this paper, we use simulations to investigate the use of decision markets as mechanisms in a multi-agent learning system to aggregate distributed information for decision-making in a contextual bandit problem. The system utilises strictly proper decision scoring rules to assess the accuracy of probabilistic reports from agents, which allows agents to learn to solve the contextual bandit problem jointly. Our simulations show that our multi-agent system with distributed information can be trained as efficiently as a centralised counterpart with a single agent that receives all information. Moreover, we use our system to investigate scenarios with deterministic decision scoring rules which are not incentive compatible. We observe the emergence of more complex dynamics with manipulative behaviour, which agrees with existing theoretical analyses.