论文标题

Luxor:用于高效压缩树实现的FPGA逻辑单元架构

LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations

论文作者

Rasoulinezhad, SeyedRamin, Siddhartha, Zhou, Hao, Wang, Lingli, Boland, David, Leong, Philip H. W.

论文摘要

我们建议对FPGA逻辑细胞体系结构进行两层修改,以提供各种性能和利用优势,而只有较小的面积开销。在第一个层中,我们使用6个输入XOR门来增强现有的商业逻辑细胞数据索,以提高每个元素的表现力,同时保持向后兼容性。这种新的架构是供应商 - 敏捷的,我们称其为卢克索。我们还考虑了Xilinx和Intel FPGA的供应商特定修改的次要层,我们分别称为X-Luxor+和I-Luxor+。我们证明,随着提出的修改,使用广义平行计数器(GPC)的压缩机树合成进一步改善。在比较研究中,使用Intel自适应逻辑模块和Xilinx Slice在65NM技术节点上使用,这表明Luxor的硅面积架空的开销小于0.5%,而Luxor+的硅面积分别为1-6%和3-9%。 We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-speciic modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源