论文标题
MapReduce通信模式的基准测试和性能建模
Benchmarking and Performance Modelling of MapReduce Communication Pattern
论文作者
论文摘要
了解和预测在云或本地运行的大数据应用程序的性能可以帮助最大程度地减少运营成本,并为识别绩效瓶颈提供机会。大数据框架的低级内部以及应用程序和工作负载配置参数的普遍性的复杂性使得提出全面的性能建模解决方案使其具有挑战性且昂贵。 在本文中,我们没有专注于广泛的可配置参数,而是研究了MapReduce通信模式的低级内部,并使用了最小的性能驱动程序来开发一组阶段级别参数模型,以近似在给定群集上给定应用程序的执行时间。模型可用于推断看不见的应用程序的性能,并在使用任意数据集作为输入时近似其性能。通过在两个设置中运行经验实验来验证我们的方法。平均而言,两个设置中的错误率均为正值或减去测量值10%。
Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input. Our approach is validated by running empirical experiments in two setups. On average the error rate in both setups is plus or minus 10% from the measured values.