迈向基于机器学习的基于机器学习网站网络钓鱼检测的数据集：一项实验研究

论文标题

迈向基于机器学习的基于机器学习网站网络钓鱼检测的数据集：一项实验研究

Towards Benchmark Datasets for Machine Learning Based Website Phishing Detection: An experimental study

论文作者

Hannousse, Abdelhakim, Yahiouche, Salima

论文摘要

在本文中，我们提出了一个通用计划，用于构建用于网站网络钓鱼检测的可再现和可扩展数据集。目的是（1）使用不同功能启用系统比较，（2）超越网站网站的短暂性质，以及（3）跟踪网络钓鱼策略的演变。为了实验拟议的方案，我们首先采用了网站网络钓鱼功能的精致分类，并系统地选择了87个普遍认可的方案，我们将其分类，并将其作为相关性和运行时分析的主题。我们使用收集的一组功能来构建数据集，以鉴于所提出的方案。此后，我们使用一种概念复制方法来检查构建数据集以前的发现的通用性。具体而言，我们评估了分类器在单个类和类的组合上的性能，我们研究了模型的不同组合，并探讨了滤镜和包装方法对选择判别特征的影响。结果表明，随机森林是最预测的分类器。从外部服务中收集的功能是最歧视性的，其中发现从网页内容中提取的功能较少。除了基于外部服务的功能外，还发现了一些网页内容功能，并且不适合运行时检测。混合功能的使用提供了96.61％的最佳精度得分。通过研究不同的特征选择方法，基于滤波器的排名与较不重要的特征的增量去除相比，较低的功能的逐步排名提高了比包装法的高达96.83％。

In this paper, we present a general scheme for building reproducible and extensible datasets for website phishing detection. The aim is to (1) enable comparison of systems using different features, (2) overtake the short-lived nature of phishing websites, and (3) keep track of the evolution of phishing tactics. For experimenting the proposed scheme, we start by adopting a refined classification of website phishing features and we systematically select a total of 87 commonly recognized ones, we classify them, and we made them subjects for relevance and runtime analysis. We use the collected set of features to build a dataset in light of the proposed scheme. Thereafter, we use a conceptual replication approach to check the genericity of former findings for the built dataset. Specifically, we evaluate the performance of classifiers on individual classes and on combinations of classes, we investigate different combinations of models, and we explore the effects of filter and wrapper methods on the selection of discriminative features. The results show that Random Forest is the most predictive classifier. Features gathered from external services are found the most discriminative where features extracted from web page contents are found less distinguishing. Besides external service based features, some web page content features are found time consuming and not suitable for runtime detection. The use of hybrid features provided the best accuracy score of 96.61%. By investigating different feature selection methods, filter-based ranking together with incremental removal of less important features improved the performance up to 96.83% better than wrapper methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题