论文标题
Avast-CTU公共开普数据集
Avast-CTU Public CAPE Dataset
论文作者
论文摘要
有限的公开数据可以支持恶意软件分析技术的研究。特别是,几乎没有由杜鹃/斗篷等丰富的沙盒生成的公开可用数据集。使用动态沙箱的好处是对目标机器中文件执行的逼真模拟并获得该执行的日志。机器可以被恶意软件感染,因此很有可能在执行日志中捕获恶意行为,从而使研究人员可以详细研究这种行为。尽管随后对日志信息的分析在工业网络安全后端被广泛介绍,但据我们所知,在学术界投入了有限的努力,以使用尖端技术提高此类日志分析功能。我们使此示例数据集可用来支持设计新的机器学习方法以进行恶意软件检测,尤其是用于自动检测通用恶意行为。该数据集是在Avast软件和捷克技术大学-AI中心(AIC)合作的合作中收集的。
There is a limited amount of publicly available data to support research in malware analysis technology. Particularly, there are virtually no publicly available datasets generated from rich sandboxes such as Cuckoo/CAPE. The benefit of using dynamic sandboxes is the realistic simulation of file execution in the target machine and obtaining a log of such execution. The machine can be infected by malware hence there is a good chance of capturing the malicious behavior in the execution logs, thus allowing researchers to study such behavior in detail. Although the subsequent analysis of log information is extensively covered in industrial cybersecurity backends, to our knowledge there has been only limited effort invested in academia to advance such log analysis capabilities using cutting edge techniques. We make this sample dataset available to support designing new machine learning methods for malware detection, especially for automatic detection of generic malicious behavior. The dataset has been collected in cooperation between Avast Software and Czech Technical University - AI Center (AIC).