在现实的限制顺序簿市场模拟中，多机构增强学习

论文标题

在现实的限制顺序簿市场模拟中，多机构增强学习

Multi-Agent Reinforcement Learning in a Realistic Limit Order Book Market Simulation

论文作者

Karpe, Michaël, Fang, Jin, Ma, Zhongyao, Wang, Chen

论文摘要

最佳订单执行是由行业从业人员和学术研究人员广泛研究的，因为它决定了投资决策和高级交易策略的盈利能力，尤其是涉及大量订单的交易策略。但是，复杂而未知的市场动态对最佳执行策略的发展和验证构成了重大挑战。在本文中，我们通过在具有多种代理的现实市场模拟环境中培训强化学习（RL）代理提出了一种无模型的方法。首先，我们为基于代理的交互式离散事件仿真（ABIDES）[ARXIV：1904.12066]构建的多代理历史订单仿真环境为执行任务。其次，我们在RL设置中提出了最佳执行问题，在RL设置中，智能代理可以根据市场微观结构交易信号（HFT）做出订单执行和放置决策。第三，我们使用Abides环境中的双重深Q学习（DDQL）算法开发和训练RL执行剂。在某些情况下，我们的RL代理会收集到时间加权平均价格（TWAP）策略。最后，我们通过使用实际市场限制订单（LOB）数据将其与市场重播模拟进行比较，评估了RL代理的模拟。

Optimal order execution is widely studied by industry practitioners and academic researchers because it determines the profitability of investment decisions and high-level trading strategies, particularly those involving large volumes of orders. However, complex and unknown market dynamics pose significant challenges for the development and validation of optimal execution strategies. In this paper, we propose a model-free approach by training Reinforcement Learning (RL) agents in a realistic market simulation environment with multiple agents. First, we configure a multi-agent historical order book simulation environment for execution tasks built on an Agent-Based Interactive Discrete Event Simulation (ABIDES) [arXiv:1904.12066]. Second, we formulate the problem of optimal execution in an RL setting where an intelligent agent can make order execution and placement decisions based on market microstructure trading signals in High Frequency Trading (HFT). Third, we develop and train an RL execution agent using the Double Deep Q-Learning (DDQL) algorithm in the ABIDES environment. In some scenarios, our RL agent converges towards a Time-Weighted Average Price (TWAP) strategy. Finally, we evaluate the simulation with our RL agent by comparing it with a market replay simulation using real market Limit Order Book (LOB) data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题