DNN2LR：现实世界表数据的解释启发的特征交叉

论文标题

DNN2LR：现实世界表数据的解释启发的特征交叉

DNN2LR: Interpretation-inspired Feature Crossing for Real-world Tabular Data

论文作者

Liu, Zhaocheng, Liu, Qiang, Zhang, Haoli, Chen, Yuntian

论文摘要

为了提供可靠性，实际应用程序中有必要具有强大且在全球上可以解释的模型。简单的分类器，例如逻辑回归（LR）是可以解释的，但不足以模拟表格数据中特征之间复杂的非线性相互作用。同时，深度神经网络（DNNS）在对表格数据进行建模方面表现出很大的有效性，但在全球范围内不可解释。在这项工作中，我们在DNN中发现特定特征的本地曲目通常在不同的样本中不一致，这是由于隐藏层中的特征相互作用引起的。因此，我们可以设计一种自动特征交叉方法，以在DNN中找到特征交互，并将其用作LR中的交叉特征。我们给出了DNN中解释不一致的定义，该方法基于一种新型特征交叉方法，称为DNN2LR。已经在四个公共数据集和两个现实世界数据集上进行了广泛的实验。 DNN2LR生成的最终模型，即具有交叉特征的LR模型，可以胜过复杂的DNN模型，以及几种最先进的特征交叉方法。实验结果强烈验证了DNN2LR的有效性和效率，尤其是在具有大量特征字段的现实世界数据集上。

For sake of reliability, it is necessary for models in real-world applications to be both powerful and globally interpretable. Simple classifiers, e.g., Logistic Regression (LR), are globally interpretable, but not powerful enough to model complex nonlinear interactions among features in tabular data. Meanwhile, Deep Neural Networks (DNNs) have shown great effectiveness for modeling tabular data, but is not globally interpretable. In this work, we find local piece-wise interpretations in DNN of a specific feature are usually inconsistent in different samples, which is caused by feature interactions in the hidden layers. Accordingly, we can design an automatic feature crossing method to find feature interactions in DNN, and use them as cross features in LR. We give definition of the interpretation inconsistency in DNN, based on which a novel feature crossing method called DNN2LR is proposed. Extensive experiments have been conducted on four public datasets and two real-world datasets. The final model, i.e., a LR model empowered with cross features, generated by DNN2LR can outperform the complex DNN model, as well as several state-of-the-art feature crossing methods. The experimental results strongly verify the effectiveness and efficiency of DNN2LR, especially on real-world datasets with large numbers of feature fields.

下载PDF全文

下载文献需遵守相关版权规定

论文标题