论文标题
Phishzip:一种用于检测网络钓鱼网站的新的基于压缩的算法
PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites
论文作者
论文摘要
在过去的几年中,网络钓鱼已经大大增长,预计将来会进一步增加。网络钓鱼的动态引入了实施强大的网络钓鱼检测系统的挑战,并选择了尽管攻击发生了变化,但可以代表网络钓鱼的功能。在本文中,我们提出了Phishzip,这是一种使用压缩算法来执行网站分类的新型网络钓鱼检测方法,并演示了一种系统的方法,以使用单词出现的可能性分析为压缩模型构造字典。 Phishzip在过去的研究中优于最佳基于HTML的功能,其真正的正率为80.04%。我们还建议将压缩比用作一种新型机器学习功能,可显着改善基于机器学习的网络钓鱼检测。使用压缩比作为其他特征,真正的正率显着提高了30.3%(从51.47%到81.77%),而准确性则增加了11.84%(从71.20%到83.04%)。
Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while the accuracy increases by 11.84% (from 71.20% to 83.04%).