论文标题
在深度学习堆栈中揭开依赖性错误
Demystifying Dependency Bugs in Deep Learning Stack
论文作者
论文摘要
建立在异质且复杂的DL堆栈(例如Nvidia GPU,Linux,Cuda驱动程序,Python Runtime和TensorFlow)上的深度学习(DL)应用程序都受到DL堆栈的软件和硬件依赖性的影响。在整个工程生命周期中,依赖关系管理中的一个挑战是异步和激进的进化以及依赖关系之间的复杂版本约束所构成的。开发人员可以在选择,使用和维护依赖项时引入依赖性错误(DB)。但是,DL堆栈中DBS的特征仍然不足,阻碍了DL堆栈中依赖性管理的实用解决方案。为了弥合这一差距,本文介绍了第一个全面的研究,以表征整个DL堆栈中DBS的症状,根本原因和修复模式,并从stackoverflow柱子和GitHub问题中收集了446个DB。对于每个DB,我们首先研究症状以及症状暴露的症状和依赖性。然后,我们分析了根本原因和依赖性引入根本原因的根本原因和依赖性。最后,我们探讨了修复模式以及用于修复它的知识源。我们从这项研究中的发现阐明了对依赖管理的实际影响。
Deep learning (DL) applications, built upon a heterogeneous and complex DL stack (e.g., Nvidia GPU, Linux, CUDA driver, Python runtime, and TensorFlow), are subject to software and hardware dependencies across the DL stack. One challenge in dependency management across the entire engineering lifecycle is posed by the asynchronous and radical evolution and the complex version constraints among dependencies. Developers may introduce dependency bugs (DBs) in selecting, using and maintaining dependencies. However, the characteristics of DBs in DL stack is still under-investigated, hindering practical solutions to dependency management in DL stack. To bridge this gap, this paper presents the first comprehensive study to characterize symptoms, root causes and fix patterns of DBs across the whole DL stack with 446 DBs collected from StackOverflow posts and GitHub issues. For each DB, we first investigate the symptom as well as the lifecycle stage and dependency where the symptom is exposed. Then, we analyze the root cause as well as the lifecycle stage and dependency where the root cause is introduced. Finally, we explore the fix pattern and the knowledge sources that are used to fix it. Our findings from this study shed light on practical implications on dependency management.