论文标题
图形:在非常大规模的图中与SSD压缩体系结构高度平行的收集和过程
GRAPHIC: GatheR-And-Process in Highly parallel with In-SSD Compression Architecture in Very Large-Scale Graph
论文作者
论文摘要
图形卷积网络(GCN)是一种用于图计算的新兴算法,在图形结构任务中实现了有希望的性能。为了实现数据密集型和稀疏图计算的加速度,已经提出了诸如GCNAX之类的ASIC来有效执行GCN的聚合和组合。与以前的努力相比,GCNAX减少了8倍DRAM访问。但是,随着图形的大小达到了Terabytes,从SSD到DRAM的芯片数据移动变成了严重的延迟瓶颈。本文提出了压缩图传输(CGTRAN),该传输执行SSD中的聚合以极大地缓解与基于CMOS的Graph Accelerator ASIC相比,由于SSD加载而引起的传输潜伏期瓶颈。 CGTRAN需要INSSD计算技术。最近,Insider是通过在SSD中集成FPGA作为近SSD处理系统计算的。但是,内部人员仍然遭受低面积效率的效率,这将限制CGTRAN的性能。使用了最近提出的完全并发访问技术(FAST)。作为SSD图计算加速器的快速气体被提议提供高额的聚集和散射操作,以克服面积效率问题。我们提出了包含在快速气体上部署的CGTRANS数据流的图形系统。实验显示CGTRANS将SSD载荷降低了50倍,而图形分别在内部人员的GCNAX和CGTRAN上平均达到3.6倍和2.4倍的速度。
Graph convolutional network (GCN), an emerging algorithm for graph computing, has achieved promising performance in graphstructure tasks. To achieve acceleration for data-intensive and sparse graph computing, ASICs such as GCNAX have been proposed for efficient execution of aggregation and combination in GCN. GCNAX reducing 8x DRAM accesses compared with previous efforts. However, as graphs have reached terabytes in size, off-chip data movement from SSD to DRAM becomes a serious latency bottleneck. This paper proposes Compressive Graph Transmission (CGTrans), which performs the aggregation in SSD to dramatically relieves the transfer latency bottleneck due to SSD loading compared to CMOS-based graph accelerator ASICs. InSSD computing technique is required for CGTrans. Recently, Insider was proposed as a near-SSD processing system computing by integrating FPGA in SSD. However, the Insider still suffers low area efficiency, which will limit the performance of CGTrans. The recently proposed Fully Concurrent Access Technique (FAST) is utilized. FAST-GAS, as an in-SSD graph computing accelerator, is proposed to provide high-concurrent gather-andscatter operations to overcome the area efficiency problem. We proposed the GRAPHIC system containing CGTrans dataflow deployed on FAST-GAS. Experiments show CGTrans reduces SSD loading by a factor of 50x, while GRAPHIC achieves 3.6x, and 2.4x speedup on average over GCNAX and CGTrans on Insider, respectively.