论文标题

MLAAS4HEP:机器学习作为HEP的服务

MLaaS4HEP: Machine Learning as a Service for HEP

论文作者

Kuznetsov, Valentin, Giommi, Luca, Bonacorsi, Daniele

论文摘要

机器学习(ML)将在CERN即将到来的高亮度LHC(HL-LHC)计划的成功中发挥重要作用。 LHC实验将在未来十年内收集Exascale的空前数据,这项工作将需要新颖的方法来训练和使用ML模型。在本文中,我们讨论了一项机器学习作为HEP(MLAAS4HEP)的服务管道,该服务管线提供了三个独立的层:一个数据流层以读取其本机根数据格式的高能物理学(HEP)数据;一个数据训练层使用分布式根文件训练ML模型;通过HTTP协议使用预训练的ML模型来提供预测的数据推断层。这种模块化设计通过从远程存储设施中读取根文件,例如全球LHC计算网格(WLCG)基础架构,并将数据馈送到用户喜欢的ML框架中。作为服务(TFAAS)实现的推理层可以轻松访问现有基础架构和HEP域内或外部应用程序中的预训练的ML模型。特别是,我们演示了MLAAS4HEP体系结构对物理用例的用法,即最初使用定制的NTUPLES执行的CMS中的$ t \ bar {t} $ higgs分析。我们使用分布式根文件提供有关ML模型训练的详细信息,讨论用于选定物理分析的MLAA和TFAA方法的性能,并将结果与​​传统方法进行比较。

Machine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g. World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user's favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely the $t\bar{t}$ Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源