论文标题

重新审视图表深度学习的预处理目标

Revisiting Pretraining Objectives for Tabular Deep Learning

论文作者

Rubachev, Ivan, Alekberov, Artem, Gorishniy, Yury, Babenko, Artem

论文摘要

目前,最新的表格数据模型与基于决策树(GBDT)的传统ML模型竞争。与GBDT不同,深层模型可以从训练中受益,这是视觉和NLP的DL主力军。对于表格问题,提出了几种预处理的方法,但是尚不完全清楚训练是否提供一致的明显改进以及应使用哪种方法,因为这些方法通常不会彼此比较或比较仅限于最简单的MLP体系结构。 在这项工作中,我们旨在确定可以将可以普遍应用于不同数据集和体系结构的表格DL模型的最佳实践。在我们的发现中,我们表明,在训练阶段使用对象目标标签对下游性能是有益的,并主张几个目标感知的预刻训练的目标。总体而言,我们的实验表明,正确执行预处理会显着提高表格DL模型的性能,这通常会导致其优于GBDT。

Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP. For tabular problems, several pretraining methods were proposed, but it is not entirely clear if pretraining provides consistent noticeable improvements and what method should be used, since the methods are often not compared to each other or comparison is limited to the simplest MLP architectures. In this work, we aim to identify the best practices to pretrain tabular DL models that can be universally applied to different datasets and architectures. Among our findings, we show that using the object target labels during the pretraining stage is beneficial for the downstream performance and advocate several target-aware pretraining objectives. Overall, our experiments demonstrate that properly performed pretraining significantly increases the performance of tabular DL models, which often leads to their superiority over GBDTs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源