通过顶级域分析进行网络钓鱼URL检测：一种描述性方法

论文标题

通过顶级域分析进行网络钓鱼URL检测：一种描述性方法

Phishing URL Detection Through Top-level Domain Analysis: A Descriptive Approach

论文作者

Christou, Orestis, Pitropakis, Nikolaos, Papadopoulos, Pavlos, McKeown, Sean, Buchanan, William J.

论文摘要

网络钓鱼被认为是最普遍的网络攻击之一，因为它具有巨大的灵活性和令人震惊的成功率。即使接受了足够的培训和高度的情境意识，用户仍然很难不断意识到他们访问的网站的URL。传统的检测方法依赖于区块列表和内容分析，这两者都需要耗时的人类验证。因此，已经尝试着重于此类URL的预测过滤。这项研究旨在开发一种机器学习模型，以检测可在Splunk平台内使用的欺诈性URL。受到文献中类似方法的启发，我们使用了文献中发现的恶意和良性数据集以及我们创建的一个数据集训练了SVM和随机森林算法。我们以精确和召回率评估了算法的性能，在随机森林的情况下达到了高达85％的精度和87％的召回率，而SVM的精度最高为90％，并且仅使用描述性特征，召回了88％的召回。

Phishing is considered to be one of the most prevalent cyber-attacks because of its immense flexibility and alarmingly high success rate. Even with adequate training and high situational awareness, it can still be hard for users to continually be aware of the URL of the website they are visiting. Traditional detection methods rely on blocklists and content analysis, both of which require time-consuming human verification. Thus, there have been attempts focusing on the predictive filtering of such URLs. This study aims to develop a machine-learning model to detect fraudulent URLs which can be used within the Splunk platform. Inspired from similar approaches in the literature, we trained the SVM and Random Forests algorithms using malicious and benign datasets found in the literature and one dataset that we created. We evaluated the algorithms' performance with precision and recall, reaching up to 85% precision and 87% recall in the case of Random Forests while SVM achieved up to 90% precision and 88% recall using only descriptive features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题