在对抗环境中的量子强盗和振幅扩增探索

论文标题

在对抗环境中的量子强盗和振幅扩增探索

Quantum bandit with amplitude amplification exploration in an adversarial environment

论文作者

Cho, Byungjin, Xiao, Yu, Hui, Pan, Dong, Daoyi

论文摘要

在任意变化的环境中，学习系统的快速扩散要求管理勘探和剥削之间的紧张局势。这项工作为基于学习和适应的卸载问题提出了一种量子启发的匪徒学习方法，其中客户观察并了解每个任务的成本卸载给候选资源提供者，例如FOG节点。在这种方法中，采用了新的动作更新策略和新颖的概率行动选择，这是通过量子计算理论中的幅度扩增和崩溃假设引起的。我们在量子域中的量子力学相之间设计了局部线性映射，例如，Grover-type搜索算法和基于价值的决策域中的蒸馏概率标志性，例如对抗性多臂bundit算法。通过设计的映射，提出的算法将广义化，以更好地调整有利/不利动作的学习权重调整，并通过模拟验证其有效性。

The rapid proliferation of learning systems in an arbitrarily changing environment mandates the need for managing tensions between exploration and exploitation. This work proposes a quantum-inspired bandit learning approach for the learning-and-adapting-based offloading problem where a client observes and learns the costs of each task offloaded to the candidate resource providers, e.g., fog nodes. In this approach, a new action update strategy and novel probabilistic action selection are adopted, provoked by the amplitude amplification and collapse postulate in quantum computation theory, respectively. We devise a locally linear mapping between a quantum-mechanical phase in a quantum domain, e.g., Grover-type search algorithm, and a distilled probability-magnitude in a value-based decision-making domain, e.g., adversarial multi-armed bandit algorithm. The proposed algorithm is generalized, via the devised mapping, for better learning weight adjustments on favourable/unfavourable actions and its effectiveness is verified via simulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题