论文标题

Tinykg:知识图神经推荐系统的记忆效率训练框架

TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems

论文作者

Chen, Huiyuan, Li, Xiaoting, Zhou, Kaixiong, Hu, Xia, Yeh, Chin-Chia Michael, Zheng, Yan, Yang, Hao

论文摘要

在设计各种知识图神经网络(KGNN)方面,人们一直在兴趣,这些神经网络(KGNN)达到了最先进的性能并为推荐提供了极大的解释性。有希望的表现主要是由于它们在知识图上捕获高阶接近消息的能力。但是,由于高内存使用量,大规模培训KGNN具有挑战性。在正向通行证中,自动差异引擎(\ textsl {e.g。},tensorflow/pytorch)通常需要缓存所有中间激活图,以便在向后的通行证中计算梯度,从而导致较大的GPU存储器足迹。现有工作通过使用多GPU分布式框架来解决此问题。尽管如此,在寻求在内存约束环境中部署KGNN时,这构成了一个实用的挑战,尤其是对于行业规模的图表。 在这里,我们提出了Tinykg,这是针对推荐任务的KGNN的基于内存有效的GPU培训框架。具体而言,TinyKG在向前通行证中使用精确的激活,同时在GPU缓冲区中存储量化的激活。在向后通过的过程中,这些低精度激活被取消回到完全精确的张量,以计算梯度。为了减少量化误差,TinyKG应用了一种简单而有效的量化算法来压缩激活,从而确保差异较低。因此,KGNN的训练记忆足迹大大降低,准确性损失可忽略不计。为了评估我们的Tinykg的性能,我们在现实世界数据集上进行了全面的实验。我们发现,具有INT2量化的TinyKG会以$ 7 \ times $的价格积极地减少激活图的内存足迹,仅准确性$ 2 \%$损失,使我们可以在内存约束设备上部署KGNN。

There has been an explosion of interest in designing various Knowledge Graph Neural Networks (KGNNs), which achieve state-of-the-art performance and provide great explainability for recommendation. The promising performance is mainly resulting from their capability of capturing high-order proximity messages over the knowledge graphs. However, training KGNNs at scale is challenging due to the high memory usage. In the forward pass, the automatic differentiation engines (\textsl{e.g.}, TensorFlow/PyTorch) generally need to cache all intermediate activation maps in order to compute gradients in the backward pass, which leads to a large GPU memory footprint. Existing work solves this problem by utilizing multi-GPU distributed frameworks. Nonetheless, this poses a practical challenge when seeking to deploy KGNNs in memory-constrained environments, especially for industry-scale graphs. Here we present TinyKG, a memory-efficient GPU-based training framework for KGNNs for the tasks of recommendation. Specifically, TinyKG uses exact activations in the forward pass while storing a quantized version of activations in the GPU buffers. During the backward pass, these low-precision activations are dequantized back to full-precision tensors, in order to compute gradients. To reduce the quantization errors, TinyKG applies a simple yet effective quantization algorithm to compress the activations, which ensures unbiasedness with low variance. As such, the training memory footprint of KGNNs is largely reduced with negligible accuracy loss. To evaluate the performance of our TinyKG, we conduct comprehensive experiments on real-world datasets. We found that our TinyKG with INT2 quantization aggressively reduces the memory footprint of activation maps with $7 \times$, only with $2\%$ loss in accuracy, allowing us to deploy KGNNs on memory-constrained devices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源