论文标题
在线恶意软件分类,包括云IaaS中的系统范围的系统调用
Online Malware Classification with System-Wide System Calls in Cloud IaaS
论文作者
论文摘要
准确地对环境中的恶意软件进行分类,可以通过网络分析师创建更好的响应和补救策略。但是,由于系统数据源数量大量,在实时环境中对恶意软件进行分类是一项艰巨的任务。从这些单独的来源收集统计信息并以机器学习模型可以使用的形式将它们一起处理。幸运的是,所有这些资源都是由操作系统的内核介导的。用户程序,包括恶意软件,通过使用系统调用向内核提出请求,与系统资源进行交互。收集这些系统调用为与单个位置中许多系统资源的交互提供了洞察力。将这些系统呼叫馈送到诸如随机森林之类的性能模型中,可以在某些情况下快速,准确的分类。在本文中,我们评估了使用系统呼叫序列进行在线恶意软件分类的可行性,以低活动性和重用云IaaS进行。我们收集了系统的呼叫,因为它们是由内核接收的,并采用n-gram调用序列,以用作基于树的机器学习模型的功能。我们讨论了模型在基线系统上的性能,而没有额外的运行服务和系统在重负载下以及它们之间的性能差距。
Accurately classifying malware in an environment allows the creation of better response and remediation strategies by cyber analysts. However, classifying malware in a live environment is a difficult task due to the large number of system data sources. Collecting statistics from these separate sources and processing them together in a form that can be used by a machine learning model is difficult. Fortunately, all of these resources are mediated by the operating system's kernel. User programs, malware included, interacts with system resources by making requests to the kernel with system calls. Collecting these system calls provide insight to the interaction with many system resources in a single location. Feeding these system calls into a performant model such as a random forest allows fast, accurate classification in certain situations. In this paper, we evaluate the feasibility of using system call sequences for online malware classification in both low-activity and heavy-use Cloud IaaS. We collect system calls as they are received by the kernel and take n-gram sequences of calls to use as features for tree-based machine learning models. We discuss the performance of the models on baseline systems with no extra running services and systems under heavy load and the performance gap between them.