Phishzip：一种用于检测网络钓鱼网站的新的基于压缩的算法

论文标题

Phishzip：一种用于检测网络钓鱼网站的新的基于压缩的算法

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

论文作者

Purwanto, Rizka, Pal, Arindam, Blair, Alan, Jha, Sanjay

论文摘要

在过去的几年中，网络钓鱼已经大大增长，预计将来会进一步增加。网络钓鱼的动态引入了实施强大的网络钓鱼检测系统的挑战，并选择了尽管攻击发生了变化，但可以代表网络钓鱼的功能。在本文中，我们提出了Phishzip，这是一种使用压缩算法来执行网站分类的新型网络钓鱼检测方法，并演示了一种系统的方法，以使用单词出现的可能性分析为压缩模型构造字典。 Phishzip在过去的研究中优于最佳基于HTML的功能，其真正的正率为80.04％。我们还建议将压缩比用作一种新型机器学习功能，可显着改善基于机器学习的网络钓鱼检测。使用压缩比作为其他特征，真正的正率显着提高了30.3％（从51.47％到81.77％），而准确性则增加了11.84％（从71.20％到83.04％）。

Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while the accuracy increases by 11.84% (from 71.20% to 83.04%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题