论文标题
在CPU-FPGA异质平台上加速蒙特卡洛树搜索
Accelerating Monte-Carlo Tree Search on CPU-FPGA Heterogeneous Platform
论文作者
论文摘要
蒙特卡洛树搜索(MCT)方法在许多人工智能(AI)基准中取得了巨大的成功。树中的操作成为实现CPU并行MCT的关键性能瓶颈。在这项工作中,我们为树并行MCT开发了可扩展的CPU-FPGA系统。我们建议将MCTS数据结构和计算的新颖分解和映射到CPU和FPGA上,以减少通信和协调。通过将基于SRAM的FPGA加速器中的树内操作封装来实现我们系统的高可扩展性。为了降低高数据访问延迟和工具间的同步开销,我们开发了多个硬件优化。我们表明,通过使用我们的加速器,我们可以获得高达$ 35 \ times $速度的$速度,而$ 35 \ $ 3 \ times $更高的总体系统吞吐量。我们的CPU-FPGA系统还可以实现比CPU上最新的平行MCT实现的较高的并行工人的可扩展性WRT数量。
Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop a scalable CPU-FPGA system for Tree-Parallel MCTS. We propose a novel decomposition and mapping of MCTS data structure and computation onto CPU and FPGA to reduce communication and coordination. High scalability of our system is achieved by encapsulating in-tree operations in an SRAM-based FPGA accelerator. To lower the high data access latency and inter-worker synchronization overheads, we develop several hardware optimizations. We show that by using our accelerator, we obtain up to $35\times$ speedup for in-tree operations, and $3\times$ higher overall system throughput. Our CPU-FPGA system also achieves superior scalability wrt number of parallel workers than state-of-the-art parallel MCTS implementations on CPU.