论文标题

FADO:多-DIE FPGA的高级合成设计的平面图指令优化

FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs

论文作者

Du, Linfeng, Liang, Tingyuan, Sinha, Sharad, Xie, Zhiyao, Zhang, Wei

论文摘要

多-DIE FPGA被广泛用于部署大型硬件加速器。两个因素阻碍了在多-DIE FPGA上实施的HLS设计的性能优化。一方面,越过越过模具束的网的长净延迟会导致NP硬质问题适当地平面图和管道。另一方面,HLS指令优化的传统自动化搜索流针对单DIE FPGA,因此,它不能考虑每个模具上的资源限制,以及死亡交叉所产生的正时问题。此外,由于大型设计量表,在指令优化期间,在指令优化期间,在每组配置下生成的HLS设计的平面图合法化。 为了优化多-DIE FPGAS上HLS设计的指令和平面图,我们提出了FADO框架,该框架基于多选择的多维多维Bin包装制定了指令 - 地板的共同搜索问题,并使用迭代优化流程来解决。对于指令搜索的每个步骤,延迟底层引导的贪婪算法都搜索更有效的指令配置。为了进行平面规划,我们没有重复产生全球平面图算法,而是实施了更有效的增量平面图合法化算法。它主要采用最差的在线包装算法来平衡平面图,并连同离线最佳拟合的重新包装一起重新包装,以紧凑地板平面图,然后是穿过越过模具的长电线的管道。 通过在HLS设计上进行混合数据流和非数据流核的实验,Fado不仅可以很好地进行合作式化,并在693x〜4925 x较短的运行时完成,而DSE在全球平面图的辅助下,还可以在整体上的1.16x 〜8.78 x中的辅助工具进行了5.78 x,又可以在整体上执行5.78 x,又可以实现x.78 x。

Multi-die FPGAs are widely adopted to deploy large hardware accelerators. Two factors impede the performance optimization of HLS designs implemented on multi-die FPGAs. On the one hand, the long net delay due to nets crossing die-boundaries results in an NP-hard problem to properly floorplan and pipeline an application. On the other hand, traditional automated searching flow for HLS directive optimizations targets single-die FPGAs, and hence, it cannot consider the resource constraints on each die and the timing issue incurred by the die-crossings. Further, it leads to an excessively long runtime to legalize the floorplan of HLS designs generated under each group of configurations during directive optimization due to the large design scale. To co-optimize the directives and floorplan of HLS designs on multi-die FPGAs, we propose the FADO framework, which formulates the directive-floorplan co-search problem based on the multi-choice multi-dimensional bin-packing and solves it using an iterative optimization flow. For each step of directive search, a latency-bottleneck-guided greedy algorithm searches for more efficient directive configurations. For floorplanning, instead of repetitively incurring global floorplanning algorithms, we implement a more efficient incremental floorplan legalization algorithm. It mainly applies the worst-fit online bin-packing algorithm to balance the floorplan, together with an offline best-fit-decreasing re-packing to compact the floorplan, followed by pipelining of long wires crossing die-boundaries. Through experiments on HLS designs mixing dataflow and non-dataflow kernels, FADO not only well-automates the co-optimization and finishes within 693X~4925X shorter runtime, compared with DSE assisted by global floorplanning, but also yields an improvement of 1.16X~8.78X in overall workflow execution time after implementation on the Xilinx Alveo U250 FPGA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源