使用生成对抗网络的关系数据综合：设计空间探索

论文标题

使用生成对抗网络的关系数据综合：设计空间探索

Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration

论文作者

Fan, Ju, Liu, Tongyu, Li, Guoliang, Chen, Junyou, Shen, Yuwei, Du, Xiaoyong

论文摘要

大数据的扩散为保护隐私数据出版带来了紧迫的需求。对该需求的传统解决方案对有效平衡已发布数据的隐私与效用之间的权衡有局限性。因此，数据库社区和机器学习社区最近使用生成对抗网络（GAN）研究了一个新的关系数据综合问题，并提出了各种算法。但是，在同一框架下未比较这些算法，因此很难理解GAN的收益和局限性。为了弥合差距，我们到目前为止进行了最全面的实验研究，该研究研究了将GAN应用于关系数据综合。我们介绍了一个基于GAN的框架，并为框架中的每个组件（包括神经网络架构和培训策略）定义了设计解决方案的空间。我们进行了广泛的实验，以探索设计空间并与传统数据合成方法进行比较。通过广泛的实验，我们发现GAN对于关系数据综合非常有前途，并为选择适当的设计解决方案提供了指导。我们还指出了GAN的局限性并确定未来的研究方向。

The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the tradeoff between privacy and utility of the released data. Thus, the database community and machine learning community have recently studied a new problem of relational data synthesis using generative adversarial networks (GAN) and proposed various algorithms. However, these algorithms are not compared under the same framework and thus it is hard for practitioners to understand GAN's benefits and limitations. To bridge the gaps, we conduct so far the most comprehensive experimental study that investigates applying GAN to relational data synthesis. We introduce a unified GAN-based framework and define a space of design solutions for each component in the framework, including neural network architectures and training strategies. We conduct extensive experiments to explore the design space and compare with traditional data synthesis approaches. Through extensive experiments, we find that GAN is very promising for relational data synthesis, and provide guidance for selecting appropriate design solutions. We also point out limitations of GAN and identify future research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题