稀疏性诱导政策分解的表示

论文标题

稀疏性诱导政策分解的表示

Sparsity Inducing Representations for Policy Decompositions

论文作者

Khadke, Ashwin, Geyer, Hartmut

论文摘要

策略分解（PODEC）是一个框架，在将政策推导到最佳控制问题时会减少维度的诅咒。对于给定的系统表示形式，即描述系统的状态变量和控制输入，PODEC生成了分解所有控制输入的策略的策略。因此，不同输入的策略以脱钩或级联的方式得出，并作为状态变量某些子集的功能，从而减少了计算。但是，系统表示的选择至关重要，因为它决定了由此产生的策略的次优性。我们提出了一种启发式方法，可以找到更适合分解的表示形式。我们的方法是基于这样的观察结果，即每个分解都以最佳成本和已经导致稀疏最佳策略的表示形式来实现政策中的稀疏模式，这可能会产生次级次优的分解。由于尚不清楚最佳策略，我们构建了一个剥夺其LQR近似值的系统表示。对于简化的双头，一个4自由度的操纵器和四肢，我们发现分解物可比Vanilla Podec确定的轨迹成本降低10％。此外，与最先进的强化学习算法获得的政策相比，分解政策产生的轨迹的成本大大降低。

Policy Decomposition (PoDec) is a framework that lessens the curse of dimensionality when deriving policies to optimal control problems. For a given system representation, i.e. the state variables and control inputs describing a system, PoDec generates strategies to decompose the joint optimization of policies for all control inputs. Thereby, policies for different inputs are derived in a decoupled or cascaded fashion and as functions of some subsets of the state variables, leading to reduction in computation. However, the choice of system representation is crucial as it dictates the suboptimality of the resulting policies. We present a heuristic method to find a representation more amenable to decomposition. Our approach is based on the observation that every decomposition enforces a sparsity pattern in the resulting policies at the cost of optimality and a representation that already leads to a sparse optimal policy is likely to produce decompositions with lower suboptimalities. As the optimal policy is not known we construct a system representation that sparsifies its LQR approximation. For a simplified biped, a 4 degree-of-freedom manipulator, and a quadcopter, we discover decompositions that offer 10% reduction in trajectory costs over those identified by vanilla PoDec. Moreover, the decomposition policies produce trajectories with substantially lower costs compared to policies obtained from state-of-the-art reinforcement learning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题