使用结合功能选择和离散化技术在医疗数据上进行学习者的绩效改进的多目标进化方法

论文标题

使用结合功能选择和离散化技术在医疗数据上进行学习者的绩效改进的多目标进化方法

Multi-Objective Evolutionary approach for the Performance Improvement of Learners using Ensembling Feature selection and Discretization Technique on Medical data

论文作者

Singh, Deepak, Sisodia, Dilip Singh, Singh, Pradeep

论文摘要

生物医学数据充满了连续的实际值。特征集中的这些值倾向于造成诸如拟合不足的问题，维度的诅咒以及由于较高的差异而增加了错误分类率。作为响应，关于数据集的预处理技术可最大程度地减少副作用，并在维持足够的准确性方面取得了成功。特征选择和离散化是有效地使用生物医学数据中数据冗余的两个必要的预处理步骤。但是，在先前的工作中，通过将特征选择和离散化在一起解决数据冗余问题而没有统一的努力导致脱节和分散的领域。本文提出了一个新型的基于多目标的降低降低框架，该框架既包含离散化和功能还原，作为进行特征选择和离散化的集合模型。从特征子集中选择最佳特征以及离散化和非污点的特征的分类受多目标遗传算法（NSGA-II）的控制。这两个目标是在特征选择过程中最小化错误率，并在离散化时将信息增益最大化为适应性标准。

Biomedical data is filled with continuous real values; these values in the feature set tend to create problems like underfitting, the curse of dimensionality and increase in misclassification rate because of higher variance. In response, pre-processing techniques on dataset minimizes the side effects and have shown success in maintaining the adequate accuracy. Feature selection and discretization are the two necessary preprocessing steps that were effectively employed to handle the data redundancies in the biomedical data. However, in the previous works, the absence of unified effort by integrating feature selection and discretization together in solving the data redundancy problem leads to the disjoint and fragmented field. This paper proposes a novel multi-objective based dimensionality reduction framework, which incorporates both discretization and feature reduction as an ensemble model for performing feature selection and discretization. Selection of optimal features and the categorization of discretized and non-discretized features from the feature subset is governed by the multi-objective genetic algorithm (NSGA-II). The two objective, minimizing the error rate during the feature selection and maximizing the information gain while discretization is considered as fitness criteria.

下载PDF全文

下载文献需遵守相关版权规定

论文标题