MET：表格数据的掩盖编码

论文标题

MET：表格数据的掩盖编码

MET: Masked Encoding for Tabular Data

论文作者

Majmundar, Kushal, Goyal, Sachin, Netrapalli, Praneeth, Jain, Prateek

论文摘要

我们考虑对表格数据的自我监督表示学习（SSL）的任务：表格-SSL。 Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data.现有的表格SSL方法以相对临时的方式设计这种增强，并且可能无法捕获基本的数据歧管。我们提出了一种新的基于重建的方法，而不是针对表格SSL的基于增强的方法，称为表格数据（MET）的掩盖编码，不需要增强。 MET基于视觉-SSL的流行MAE方法[He等，2021]，并使用两个关键思想：（i）由于表格数据集中的每个坐标都具有独特的含义，因此我们需要为所有坐标使用单独的表示，并且（ii）使用对抗性重建损失，此外还有标准标准损失。五个不同表格数据集的经验结果表明，MET在所有这些数据集上实现了新的最新技术（SOTA），并且比当前的SOTA方法提高了9％。我们通过实验在精心设计的简单数据集上进行了更多的启示。

We consider the task of self-supervised representation learning (SSL) for tabular data: tabular-SSL. Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data. Existing tabular-SSL methods design such augmentations in a relatively ad-hoc fashion and can fail to capture the underlying data manifold. Instead of augmentations based approaches for tabular-SSL, we propose a new reconstruction based method, called Masked Encoding for Tabular Data (MET), that does not require augmentations. MET is based on the popular MAE approach for vision-SSL [He et al., 2021] and uses two key ideas: (i) since each coordinate in a tabular dataset has a distinct meaning, we need to use separate representations for all coordinates, and (ii) using an adversarial reconstruction loss in addition to the standard one. Empirical results on five diverse tabular datasets show that MET achieves a new state of the art (SOTA) on all of these datasets and improves up to 9% over current SOTA methods. We shed more light on the working of MET via experiments on carefully designed simple datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题