论文标题
运行时与调度程序:分析Dask的开销
Runtime vs Scheduler: Analyzing Dask's Overheads
论文作者
论文摘要
DASK是一个分布式任务框架,数据科学家通常使用的是将Python代码平行于计算群集以很少的编程工作。它使用精致的偷窃调度程序,该调度程序已被手工调整以尽可能高效地执行任务图。但是,调度程序优化是DASK值得的努力吗?我们的论文在许多现实世界的任务图上显示,即使是完全随机的调度程序,它的内置调度程序也具有令人惊讶的竞争力,并且Dask的主要瓶颈在于其运行时开销。我们为用Rust编写的Dask Central Server开发了一个替换,该服务器与现有的DASK程序兼容。由于其有效的运行时,我们的服务器实现能够扩展到比DASK更大的群集,尽管它使用了更简单的调度算法,但在各种任务图上始终在各种任务图上表现出色。
Dask is a distributed task framework which is commonly used by data scientists to parallelize Python code on computing clusters with little programming effort. It uses a sophisticated work-stealing scheduler which has been hand-tuned to execute task graphs as efficiently as possible. But is scheduler optimization a worthwhile effort for Dask? Our paper shows on many real world task graphs that even a completely random scheduler is surprisingly competitive with its built-in scheduler and that the main bottleneck of Dask lies in its runtime overhead. We develop a drop-in replacement for the Dask central server written in Rust which is backwards compatible with existing Dask programs. Thanks to its efficient runtime, our server implementation is able to scale up to larger clusters than Dask and consistently outperforms it on a variety of task graphs, despite the fact that it uses a simpler scheduling algorithm.