深色：使用自制的深层替代模型的耐故障边缘计算

论文标题

深色：使用自制的深层替代模型的耐故障边缘计算

DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model

论文作者

Tuli, Shreshth, Casale, Giuliano, Cherkasova, Ludmila, Jennings, Nicholas R.

论文摘要

边缘计算范式的演变支持了关键延迟AI应用程序的出现。但是，边缘解决方案通常受到资源约束，由于在存在超负荷条件下的计算和通信能力和沟通能力的争议加剧而带来了可靠性挑战。尽管可以开采大量生成的日志数据进行故障预测，但是将此数据标记进行培训是手动过程，因此是自动化的限制因素。因此，许多公司求助于无监督的容错模型。然而，这种失败模型在需要适应非平稳工作量和多样化的宿主特征时会导致准确性丧失。为了应对这一点，我们提出了一种称为DeepFT的新型建模方法，以通过优化任务调度和迁移决策来主动避免系统过载及其不利影响。 DeepFT使用深层替代模型来准确预测和诊断系统中的故障，并基于共同模拟的自我监督学习，以动态地适应挥发性设置的模型。它提供了一个高度可扩展的解决方案，因为型号尺寸的尺度仅增加了3％和1％的主动任务和主机数量。对基于Raspberry-Pi的边缘群集进行的大量实验表明，深FTFT可以在断层检测和QoS指标中胜过最先进的基线方法。具体而言，DeepFT给出了最高的F1分数，用于故障检测，将违反服务截止日期的截止日期降低了37％，同时将响应时间提高了9％。

The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can be mined for fault prediction, labeling this data for training is a manual process and thus a limiting factor for automation. Due to this, many companies resort to unsupervised fault-tolerance models. Yet, failure models of this kind can incur a loss of accuracy when they need to adapt to non-stationary workloads and diverse host characteristics. To cope with this, we propose a novel modeling approach, called DeepFT, to proactively avoid system overloads and their adverse effects by optimizing the task scheduling and migration decisions. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system and co-simulation based self-supervised learning to dynamically adapt the model in volatile settings. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts. Extensive experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks shows that DeepFT can outperform state-of-the-art baseline methods in fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1 scores for fault-detection, reducing service deadline violations by up to 37\% while also improving response time by up to 9%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题