迭代图重建的数据插补

论文标题

迭代图重建的数据插补

Data Imputation with Iterative Graph Reconstruction

论文作者

Zhong, Jiajun, Ye, Weiwei, Gui, Ning

论文摘要

有效的数据归档要求从``普通的''表格数据中的``结构''发现功能丰富。基于图形神经网络的数据插补解决方案的最新进展通过将表格数据直接翻译成两部分图来显示其强大的结构学习潜力。但是，由于样本之间缺乏关系，这些解决方案同样对所有样品进行处理，这与一个重要的观察相反：``类似的样本应提供有关缺失值的更多信息。”本文提出了一个新颖的迭代图生成和丢失数据插入数据的重建框架（IGRM）。代替所有样本，而不是同等地介绍该概念：我们介绍不同的朋友网络，以不同的关系来代表其他关系。为了生成具有缺失数据的准确的朋友网络，端到端的朋友网络重建解决方案旨在允许在插补学习过程中进行连续的朋友网络优化。反过来，优化的朋友网络的表示形式用于进一步优化数据插图过程，并通过差异消息传递。八个基准数据集的实验结果表明，与九个基准相比，IGRM的平均绝对误差低39.13％，比第二好的基准低9.04％。我们的代码可在https://github.com/g-ailab/igrm上找到。

Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at https://github.com/G-AILab/IGRM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题