论文标题
VWAP策略优化的层次深度强化学习
Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization
论文作者
论文摘要
设计智能销量加权平均价格(VWAP)策略是经纪人的关键问题,因为传统的基于规则的策略相对静止,无法在动态市场中实现较低的交易成本。许多研究试图通过增强学习来最大程度地降低成本,但是有一些改进的瓶颈,尤其是对于长期策略,例如VWAP策略。为了解决这个问题,我们提出了一种深度学习和分层的增强学习的联合体系结构,称为Macro-Meta-Micro Trader(M3T),以捕获市场模式并执行不同时间尺度的订单。宏交易者首先根据传统的VWAP策略将父订单分配给基于音量配置文件的划分,但是长期的短期记忆神经网络用于提高预测准确性。然后,元交易者选择一个适合每一批次的即时流动性的短期子目标,以形成一个迷你贸易。因此,微交易者提取了即时市场状态,并以最低的交易成本履行了子目标。我们对上海证券交易所列出的股票的实验表明,我们的方法在VWAP滑倒方面的表现优于基准,与最佳基线相比,平均成本节省了1.16个基点。
Designing an intelligent volume-weighted average price (VWAP) strategy is a critical concern for brokers, since traditional rule-based strategies are relatively static that cannot achieve a lower transaction cost in a dynamic market. Many studies have tried to minimize the cost via reinforcement learning, but there are bottlenecks in improvement, especially for long-duration strategies such as the VWAP strategy. To address this issue, we propose a deep learning and hierarchical reinforcement learning jointed architecture termed Macro-Meta-Micro Trader (M3T) to capture market patterns and execute orders from different temporal scales. The Macro Trader first allocates a parent order into tranches based on volume profiles as the traditional VWAP strategy does, but a long short-term memory neural network is used to improve the forecasting accuracy. Then the Meta Trader selects a short-term subgoal appropriate to instant liquidity within each tranche to form a mini-tranche. The Micro Trader consequently extracts the instant market state and fulfils the subgoal with the lowest transaction cost. Our experiments over stocks listed on the Shanghai stock exchange demonstrate that our approach outperforms baselines in terms of VWAP slippage, with an average cost saving of 1.16 base points compared to the optimal baseline.