论文标题
隐私保护功能选择:调查并提出一组新协议
Privacy-preserving feature selection: A survey and proposing a new set of protocols
论文作者
论文摘要
特征选择是筛分功能的过程,其中内容丰富的特征与冗余和无关紧要的特征分开。这个过程在机器学习,数据挖掘和生物信息学中起着重要作用。但是,传统的功能选择方法仅能够处理集中式数据集,并且无法满足当今的分布式数据处理需求。这些需求需要一种称为隐私功能选择的新类别数据处理算法,该算法通过没有在中间处理中或最终结果中揭示任何部分数据来保护用户的数据。这对于包含个人数据的数据集至关重要。因此,合理的是修改现有算法,或者提出新算法以不仅介绍应用于分布式数据集的功能,而且还可以通过保护其隐私来负责地处理用户数据。在本文中,我们将审查三种隐私性特征选择方法,并在确定任何差距时提供建议以提高其性能。我们还将根据粗糙集特征选择提出一种隐私权特征选择方法。所提出的方法能够在两方和多方方案中水平和垂直分区的数据集处理。
Feature selection is the process of sieving features, in which informative features are separated from the redundant and irrelevant ones. This process plays an important role in machine learning, data mining and bioinformatics. However, traditional feature selection methods are only capable of processing centralized datasets and are not able to satisfy today's distributed data processing needs. These needs require a new category of data processing algorithms called privacy-preserving feature selection, which protects users' data by not revealing any part of the data neither in the intermediate processing nor in the final results. This is vital for the datasets which contain individuals' data, such as medical datasets. Therefore, it is rational to either modify the existing algorithms or propose new ones to not only introduce the capability of being applied to distributed datasets, but also act responsibly in handling users' data by protecting their privacy. In this paper, we will review three privacy-preserving feature selection methods and provide suggestions to improve their performance when any gap is identified. We will also propose a privacy-preserving feature selection method based on the rough set feature selection. The proposed method is capable of processing both horizontally and vertically partitioned datasets in two- and multi-parties scenarios.