论文标题

处理决策树中缺失的数据:一种概率方法

Handling Missing Data in Decision Trees: A Probabilistic Approach

论文作者

Khosravi, Pasha, Vergari, Antonio, Choi, YooJung, Liang, Yitao, Broeck, Guy Van den

论文摘要

决策树是一个流行的模型家族,因为它们具有吸引人的特性,例如解释性和处理异质数据的能力。同时,缺少数据是一种妨碍机器学习模型的性能的普遍发生。因此,在决策树中处理丢失的数据是一个精心研究的问题。在本文中,我们通过采用概率方法来解决这个问题。在部署时间,我们使用可拖动的密度估计器来计算模型的“预期预测”。在学习时,我们通过最大程度地减少其“预期预测损失” W.R.T. \我们的密度估计器来微调已经学习过的树的参数。与几乎没有基线相比,我们提供了简短的实验,展示了我们方法的有效性。

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the "expected prediction" of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss" w.r.t.\ our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源