在高噪声环境中发现特征相互依存的与逐步的lookahead决策森林

论文标题

在高噪声环境中发现特征相互依存的与逐步的lookahead决策森林

Uncovering Feature Interdependencies in High-Noise Environments with Stepwise Lookahead Decision Forests

论文作者

Donick, Delilah, Lera, Sandro Claudio

论文摘要

通常，随机森林是由“贪婪”的决策树建造的，它们在建造过程中一次只考虑一次分裂。贪婪实施的次级典型性是众所周知的，但缺乏对更复杂的树木建筑算法的主流采用。在什么情况下，我们检查了不太贪婪的决策树的实施实际上产生了超越的表现。为此，提出了随机森林算法的“逐步lookahead”变化，以更好地发现二进制特征相互依赖性的能力。与贪婪的方法相反，这种随机森林算法中包括的决策树同时考虑了深度二级的三个分裂节点。在合成数据和财务价格时间序列上证明，当（a）存在特征对之间的某些非线性关系时，LookAhead版本明显优于贪婪，并且（b）如果信噪比特别低。然后，通过训练贪婪和逐步lookahead随机森林来预测每日价格收益的迹象，对铜期货的长短交易策略进行了反测试。 LookAhead算法的卓越性能至少部分通过长期和短期技术指标之间存在“ XOR样”关系的部分解释。更普遍的是，在所有检查的数据集中，当特征之间没有这种关系时，随机森林之间的性能相似。鉴于其增强的能力了解复杂系统中存在的功能相关性，因此这种lookahead变化是对数据科学家的工具包的有用扩展，尤其是对于金融机器学习，通常满足条件（a）和（b）。

Conventionally, random forests are built from "greedy" decision trees which each consider only one split at a time during their construction. The sub-optimality of greedy implementation has been well-known, yet mainstream adoption of more sophisticated tree building algorithms has been lacking. We examine under what circumstances an implementation of less greedy decision trees actually yields outperformance. To this end, a "stepwise lookahead" variation of the random forest algorithm is presented for its ability to better uncover binary feature interdependencies. In contrast to the greedy approach, the decision trees included in this random forest algorithm, each simultaneously consider three split nodes in tiers of depth two. It is demonstrated on synthetic data and financial price time series that the lookahead version significantly outperforms the greedy one when (a) certain non-linear relationships between feature-pairs are present and (b) if the signal-to-noise ratio is particularly low. A long-short trading strategy for copper futures is then backtested by training both greedy and stepwise lookahead random forests to predict the signs of daily price returns. The resulting superior performance of the lookahead algorithm is at least partially explained by the presence of "XOR-like" relationships between long-term and short-term technical indicators. More generally, across all examined datasets, when no such relationships between features are present, performance across random forests is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this lookahead variation is a useful extension to the toolkit of data scientists, in particular for financial machine learning, where conditions (a) and (b) are typically met.

下载PDF全文

下载文献需遵守相关版权规定

论文标题