论文标题
通过深厚的增强学习和概率政策再利用来解码表面代码
Decoding surface codes with deep reinforcement learning and probabilistic policy reuse
论文作者
论文摘要
量子计算(QC)有望在某些硬计算机上的某些硬计算机上具有显着优势。但是,当前的量子硬件,也称为嘈杂的中间量子计算机(NISQ),仍然无法忠实地进行计算,这主要是因为缺乏量子误差校正(QEC)功能。大量的理论研究提供了各种类型的QEC代码。值得注意的拓扑代码之一是表面代码,其功能,例如仅邻近的两个Quibent Control Gates和较大的误差阈值的要求,使其成为可伸缩量子计算的领先候选人。基于机器学习(ML)的技术的最新发展,尤其是加固方法(RL)方法已应用于解码问题,并且已经取得了一定的进步。然而,设备噪声模式可能会随着时间而变化,使训练有素的解码器模型无效。在本文中,我们提出了一种持续的加强学习方法,以应对这些解码挑战。具体而言,我们使用概率策略重用(DDQN-PPR)模型实施双重Q学习,以学习具有不同噪声模式的量子环境的表面代码解码策略。通过数值模拟,我们表明所提出的DDQN-PPR模型可以显着降低计算复杂性。此外,增加训练有素的政策数量可以进一步提高代理商的绩效。我们的结果开辟了一种方法来建立功能更强大的RL代理,该代理可以利用先前获得的知识来应对QEC挑战。
Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.