论文标题
Avminer:可扩展和语义保护的反病毒标签采矿方法
AVMiner: Expansible and Semantic-Preserving Anti-Virus Labels Mining Method
论文作者
论文摘要
随着恶意软件的多样性和数量的增加,迫切需要加快恶意软件的诊断和分析。从在线防病毒发动机提供的AV(Anti-Virus)标签中提取恶意软件与家庭有关的代币,为预先诊断恶意软件铺平了道路。自动从AV标签中提取重要信息将大大提高安全企业的检测能力并为安全分析师的研究能力配备。诸如AVCLASS和AVCLASS2之类的最新作品试图从AV标签中提取恶意软件的属性,并根据专家知识建立分类法。但是,由于恶意行为复杂的趋势不确定,该系统需要以下能力来面对挑战:维护重要的语义,可扩展,并且没有专家知识。在这项工作中,我们提出了Avminer,这是一种可扩展的恶意软件标记系统,可以从AV标签中挖掘最重要的令牌。 Avminer采用自然语言处理技术和聚类方法来生成一系列令牌,而无需以重要性排名的专家知识。当新样本到来时,Avminer可以自我更新。最后,我们评估了来自著名数据集的8,000多个样本的Avminer,并具有手动标记的地面真相,这表现优于先前的作品。
With the increase in the variety and quantity of malware, there is an urgent need to speed up the diagnosis and the analysis of malware. Extracting the malware family-related tokens from AV (Anti-Virus) labels, provided by online anti-virus engines, paves the way for pre-diagnosing the malware. Automatically extract the vital information from AV labels will greatly enhance the detection ability of security enterprises and equip the research ability of security analysts. Recent works like AVCLASS and AVCLASS2 try to extract the attributes of malware from AV labels and establish the taxonomy based on expert knowledge. However, due to the uncertain trend of complicated malicious behaviors, the system needs the following abilities to face the challenge: preserving vital semantics, being expansible, and free from expert knowledge. In this work, we present AVMiner, an expansible malware tagging system that can mine the most vital tokens from AV labels. AVMiner adopts natural language processing techniques and clustering methods to generate a sequence of tokens without expert knowledge ranked by importance. AVMiner can self-update when new samples come. Finally, we evaluate AVMiner on over 8,000 samples from well-known datasets with manually labeled ground truth, which outperforms previous works.