完全动态的决策树

论文标题

完全动态的决策树

Fully-Dynamic Decision Trees

论文作者

Bressan, Marco, Damay, Gabriel, Sozio, Mauro

论文摘要

我们开发了第一个完全动态的算法，该算法通过任意插入和标记示例的删除的任意序列维护决策树。给定的$ε> 0 $我们的算法可以保证，在每个时间点，决策树的每个节点都在最佳的添加$ε$内使用带有Gini增益的分裂。对于真实值的功能，算法每插入/删除$ o \ big（\ frac {d \ log^3 n} {ε^2} \ big）$的运行时间，该$提高到$ o \ big（\ frac {d \ log^log^2 n} $ o，in IT或big big）$ o，in in in y in in $ o o o，bial $ o，n y y biaric $ o，bily（ $ n $是任何时间点的最大示例数，而$ d $是功能数量。我们的算法几乎是最佳的，因为我们证明了任何具有相似保证的算法都使用摊销运行时间$ω（d）$和space $ \tildeΩ（n d）$。我们通过对现实世界数据进行了广泛的实验评估来补充理论结果，显示了我们算法的有效性。

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given $ε> 0$ our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive $ε$ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of $O\big(\frac{d \log^3 n}{ε^2}\big)$, which improves to $O\big(\frac{d \log^2 n}ε\big)$ for binary or categorical features, while it uses space $O(n d)$, where $n$ is the maximum number of examples at any point in time and $d$ is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time $Ω(d)$ and space $\tildeΩ (n d)$. We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题