论文标题

MLOS:自动软件性能工程的基础架构

MLOS: An Infrastructure for Automated Software Performance Engineering

论文作者

Curino, Carlo, Godwal, Neha, Kroth, Brian, Kuryata, Sergiy, Lapinski, Greg, Liu, Siqi, Oks, Slava, Poppe, Olga, Smiechowski, Adam, Thayer, Ed, Weimer, Markus, Zhu, Yiwen

论文摘要

开发现代系统软件是一项复杂的任务,结合了业务逻辑编程和软件性能工程(SPE)。后者是一项实验和劳动密集型活动,旨在为给定的硬件,软件和工作负载(HW/SW/WL)上下文优化系统。 当今的SPE是由专业团队在构建/释放阶段进行的,并由以下诅咒:1)缺乏标准化和自动化工具,2)重复的重复工作,例如HW/SW/WL上下文更改,3)脆弱性,由“单尺寸拟合 - 全部 - 全部”调谐(在某个工作负载或组件上进行改进或组件可能会影响其他)。最终结果:尽管投资昂贵,但是系统软件通常不在其最佳操作点之外 - 有趣的是,桌面上的性能的30%至40%。 数据科学(DS)的最新发展暗示了一个机会:将DS工具和方法与新开发人员的经验相结合,以改变SPE的实践。在本文中,我们介绍:MLOS,一种由ML驱动的基础架构和方法来使软件性能工程民主化和自动化。 MLOS启用连续,实例级,稳健和可跟踪的系统优化。 MLOS正在Microsoft中开发和使用,以优化SQL Server性能。早期结果表明,当对特定的HW/SW/WL进行量身定制时,组件级的优化可以提高20%-90%,这暗示了一个很大的机会。但是,仍然存在一些研究挑战,需要社区参与。为此,我们正在开源MLOS核心基础架构,并且我们正在与学术机构互动,以围绕软件2.0和MLOS创意创建教育计划。

Developing modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context. Today's SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of standardized and automated tools, 2) significant repeated work as hw/sw/wl context changes, 3) fragility induced by a "one-size-fit-all" tuning (where improvements on one workload or component may impact others). The net result: despite costly investments, system software is often outside its optimal operating point - anecdotally leaving 30% to 40% of performance on the table. The recent developments in Data Science (DS) hints at an opportunity: combining DS tooling and methodologies with a new developer experience to transform the practice of SPE. In this paper we present: MLOS, an ML-powered infrastructure and methodology to democratize and automate Software Performance Engineering. MLOS enables continuous, instance-level, robust, and trackable systems optimization. MLOS is being developed and employed within Microsoft to optimize SQL Server performance. Early results indicated that component-level optimizations can lead to 20%-90% improvements when custom-tuning for a specific hw/sw/wl, hinting at a significant opportunity. However, several research challenges remain that will require community involvement. To this end, we are in the process of open-sourcing the MLOS core infrastructure, and we are engaging with academic institutions to create an educational program around Software 2.0 and MLOS ideas.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源