论文标题

完全动态的决策树

Fully-Dynamic Decision Trees

论文作者

Bressan, Marco, Damay, Gabriel, Sozio, Mauro

论文摘要

我们开发了第一个完全动态的算法,该算法通过任意插入和标记示例的删除的任意序列维护决策树。给定的$ε> 0 $我们的算法可以保证,在每个时间点,决策树的每个节点都在最佳的添加$ε$内使用带有Gini增益的分裂。对于真实值的功能,算法每插入/删除$ o \ big(\ frac {d \ log^3 n} {ε^2} \ big)$的运行时间,该$提高​​到$ o \ big(\ frac {d \ log^log^2 n} $ o,in IT或big big)$ o,in in in y in in $ o o o,bial $ o,n y y biaric $ o,bily( $ n $是任何时间点的最大示例数,而$ d $是功能数量。我们的算法几乎是最佳的,因为我们证明了任何具有相似保证的算法都使用摊销运行时间$ω(d)$和space $ \tildeΩ(n d)$。我们通过对现实世界数据进行了广泛的实验评估来补充理论结果,显示了我们算法的有效性。

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given $ε> 0$ our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive $ε$ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of $O\big(\frac{d \log^3 n}{ε^2}\big)$, which improves to $O\big(\frac{d \log^2 n}ε\big)$ for binary or categorical features, while it uses space $O(n d)$, where $n$ is the maximum number of examples at any point in time and $d$ is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time $Ω(d)$ and space $\tildeΩ (n d)$. We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源