ORLOJ：可预测的服务不可预测的DNN

论文标题

ORLOJ：可预测的服务不可预测的DNN

Orloj: Predictably Serving Unpredictable DNNs

论文作者

Yu, Peifeng, Qiu, Yuqing, Jin, Xin, Chowdhury, Mosharaf

论文摘要

现有的DNN服务解决方案可以通过仔细安排传入请求来维持高通量的延迟SLOS，该请求的执行时间被认为是高度可预测的，并且无关紧要。但是，推理请求对新兴动态DNN（例如，流行的自然语言处理（NLP）模型和计算机视觉（CV）模型（跳过层） - 依赖数据依赖于数据依赖数据。使用现有解决方案服务时，他们的性能差，因为他们在请求执行时间方面经历了较大的差异 - 批次中最长的请求夸大了较小较小方法的执行时间，因此在没有仔细批处理的情况下导致SLO错过。在本文中，我们提出了一种动态DNN服务系统ORLOJ，它使用预期请求执行时间的经验分布捕获了动态DNN中的这一差异，然后有效地批量批次并安排它们，而不知道请求的确切执行时间。 ORLOJ在紧密的SLO限制下，对高方差DNN工作载荷的最先进解决方案的最先进解决方案的完成率在51--80％上，而在更轻松的SLO设置下，超过100％。对于经过深入研究的静态DNN工作负载，ORLOJ与最先进的表现保持了可比的性能。

Existing DNN serving solutions can provide tight latency SLOs while maintaining high throughput via careful scheduling of incoming requests, whose execution times are assumed to be highly predictable and data-independent. However, inference requests to emerging dynamic DNNs -- e.g., popular natural language processing (NLP) models and computer vision (CV) models that skip layers -- are data-dependent. They exhibit poor performance when served using existing solutions because they experience large variance in request execution times depending on the input -- the longest request in a batch inflates the execution times of the smaller ones, causing SLO misses in the absence of careful batching. In this paper, we present Orloj, a dynamic DNN serving system, that captures this variance in dynamic DNNs using empirical distributions of expected request execution times, and then efficiently batches and schedules them without knowing a request's precise execution time. Orloj significantly outperforms state-of-the-art serving solutions for high variance dynamic DNN workloads by 51--80% in finish rate under tight SLO constraints, and over 100% under more relaxed SLO settings. For well-studied static DNN workloads, Orloj keeps comparable performance with the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题