使用数据综合和模糊相似性的新型元学习框架用于特征选择

论文标题

使用数据综合和模糊相似性的新型元学习框架用于特征选择

A Novel Meta Learning Framework for Feature Selection using Data Synthesis and Fuzzy Similarity

论文作者

Shen, Zixiao, Chen, Xin, Garibaldi, Jonathan M.

论文摘要

本文提出了一个基于模糊相似性的新型元学习框架（FS）。提出的方法旨在为任何给定数据集推荐四个候选FS方法的最佳FS方法。这是通过首先使用数据综合构建大型培训数据存储库来实现的。然后提取代表训练数据集特征的六个元特征。每个培训数据集的最佳FS方法都用作元标签。元特征和相应的元标签随后都用于使用基于模糊的相似性框架来训练分类模型。最后，训练有素的模型用于为给定的看不见的数据集推荐最合适的FS方法。根据八个现实世界应用程序的公共数据集评估了此提出的方法。它成功地推荐了五个数据集的最佳方法，也是一个数据集的第二好方法，该方法的表现优于四种单独的FS方法中的任何一种。此外，所提出的方法在算法选择上是计算上有效的，从而导致特征选择过程的额外时间可忽略不计。因此，本文贡献了一种新颖的方法，可有效建议哪种特征选择方法用于任何新的给定数据集。

This paper presents a novel meta learning framework for feature selection (FS) based on fuzzy similarity. The proposed method aims to recommend the best FS method from four candidate FS methods for any given dataset. This is achieved by firstly constructing a large training data repository using data synthesis. Six meta features that represent the characteristics of the training dataset are then extracted. The best FS method for each of the training datasets is used as the meta label. Both the meta features and the corresponding meta labels are subsequently used to train a classification model using a fuzzy similarity measure based framework. Finally the trained model is used to recommend the most suitable FS method for a given unseen dataset. This proposed method was evaluated based on eight public datasets of real-world applications. It successfully recommended the best method for five datasets and the second best method for one dataset, which outperformed any of the four individual FS methods. Besides, the proposed method is computationally efficient for algorithm selection, leading to negligible additional time for the feature selection process. Thus, the paper contributes a novel method for effectively recommending which feature selection method to use for any new given dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题