论文标题

logHub:大量用于AI驱动日志分析的系统日志数据集

Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics

论文作者

Zhu, Jieming, He, Shilin, He, Pinjia, Liu, Jinyang, Lyu, Michael R.

论文摘要

由于他们记录的丰富运行时信息,日志在软件系统开发和维护中已被广泛采用。近年来,软件大小和复杂性的增加导致日志量的快速增长。为了有效,有效地处理这些大量日志,一系列研究重点是开发智能和自动化的日志分析技术。但是,由于缺乏公共日志数据集并对其开放基准测试,因此只有少数这些技术在行业中成功部署。为了填补这一重大空白并促进了对AI驱动的日志分析的更多研究,我们收集并发布了LogHub,这是大量系统日志数据集的集合。特别是,LogHub提供了从广泛的软件系统收集的19个现实世界日志数据集,包括分布式系统,超级计算机,操作系统,移动系统,服务器应用程序和独立软件。在本文中,我们总结了这些数据集的统计数据,介绍了LogHub数据集的一些实际用法方案,并在LogHub上介绍了我们的基准测试结果,以使该领域的研究人员和从业人员受益。直到本文撰写本文时,来自行业和学术界的数百个组织总共下载了LogHub数据集的总数约为90,000次。 LogHub数据集可从https://github.com/logpai/loghub获得。

Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on developing intelligent and automated log analysis techniques. However, only a few of these techniques have reached successful deployments in industry due to the lack of public log datasets and open benchmarking upon them. To fill this significant gap and facilitate more research on AI-driven log analytics, we have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical usage scenarios of the loghub datasets, and present our benchmarking results on loghub to benefit the researchers and practitioners in this field. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia. The loghub datasets are available at https://github.com/logpai/loghub.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源