论文标题
网络上的网络用于现实世界应用程序中的表格数据分类
Network On Network for Tabular Data Classification in Real-world Applications
论文作者
论文摘要
表格数据是我们的客户采用的最常见的数据格式,从零售,财务到电子商务和表格数据分类对其业务起着至关重要的作用。在本文中,我们在网络(非)上介绍了基于深神经网络的实用表格数据分类模型,以提供准确的预测。已经提出了各种深入的方法,并取得了有希望的进展。但是,他们中的大多数使用神经网络和分解机等操作直接融合不同特征的嵌入,并线性地结合这些操作的输出以获得最终预测。结果,忽略了这些操作(例如神经网络和分解机)之间的场内信息和非线性相互作用。内场信息是每个字段内部属于同一字段的信息。提议非全面利用场内信息和非线性相互作用。它由三个组件:底部的现场网络组成,以捕获场内信息,跨越中间的现场网络,以选择合适的操作数据驱动的操作以及顶部的操作融合网络,以深入融合所选操作的输出。在六个现实世界数据集上进行的广泛实验表明,非表现可以胜过最先进的模型。此外,对嵌入空间中特征的定性和定量研究都表明,不可能有效捕获场内信息。
Tabular data is the most common data format adopted by our customers ranging from retail, finance to E-commerce, and tabular data classification plays an essential role to their businesses. In this paper, we present Network On Network (NON), a practical tabular data classification model based on deep neural network to provide accurate predictions. Various deep methods have been proposed and promising progress has been made. However, most of them use operations like neural network and factorization machines to fuse the embeddings of different features directly, and linearly combine the outputs of those operations to get the final prediction. As a result, the intra-field information and the non-linear interactions between those operations (e.g. neural network and factorization machines) are ignored. Intra-field information is the information that features inside each field belong to the same field. NON is proposed to take full advantage of intra-field information and non-linear interactions. It consists of three components: field-wise network at the bottom to capture the intra-field information, across field network in the middle to choose suitable operations data-drivenly, and operation fusion network on the top to fuse outputs of the chosen operations deeply. Extensive experiments on six real-world datasets demonstrate NON can outperform the state-of-the-art models significantly. Furthermore, both qualitative and quantitative study of the features in the embedding space show NON can capture intra-field information effectively.