论文标题
健忘的森林:概念上流式数据数据的高性能学习数据结构
Forgetful Forests: high performance learning data structures for streaming data under concept drift
论文作者
论文摘要
数据库研究可以通过多种方式帮助机器学习绩效。一种方法是设计更好的数据结构。本文结合了增量计算以及顺序和概率过滤的使用,以启用“健忘”基于树的学习算法以应对概念漂移数据(即,从输入到分类的函数随时间变化的数据)。 本文描述的健忘算法达到了高级表现,同时保持了流媒体数据的高质量预测。具体而言,该算法的速度比最新的增量算法快24倍,最多最多的准确性损失了2%,或者至少要快两倍,而不会损失任何准确性。这使得此类结构适用于大容量流式应用。
Database research can help machine learning performance in many ways. One way is to design better data structures. This paper combines the use of incremental computation and sequential and probabilistic filtering to enable "forgetful" tree-based learning algorithms to cope with concept drift data (i.e., data whose function from input to classification changes over time). The forgetful algorithms described in this paper achieve high time performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with at most a 2% loss of accuracy, or at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications.