确保发现具有多步逆模型的控制源潜在状态

论文标题

确保发现具有多步逆模型的控制源潜在状态

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

论文作者

Lamb, Alex, Islam, Riashat, Efroni, Yonathan, Didolkar, Aniket, Misra, Dipendra, Foster, Dylan, Molu, Lekan, Chari, Rajan, Krishnamurthy, Akshay, Langford, John

论文摘要

在许多连续的决策任务中，代理无法对世界的全部复杂性进行建模，这包括许多相关和无关的信息。例如，一个沿着城市街道行走的人试图对世界各个方面进行建模，这很快就会被众多商店，汽车和搬入和看不见的人们所淹没，每个人都遵循自己的复杂且难以理解的动态。是否可以将代理商的感官信息的消防变成最小的潜在状态，这既需要又足以使代理在世界上成功采取行动？我们具体提出了这个问题，并提出了具有理论保证的代理控制 - 内源性状态发现算法（AC-State），并且实际上证明可以发现最小的控制和源潜在状态，其中包含所有可控制药物的必要信息，同时完全揭示所有不相关的信息。该算法由一个具有信息瓶颈的多步逆模型（预测遥远观测值的动作）组成。 AC-State可以无需奖励或示范就可以进行本地化，探索和导航。我们证明了在三个领域中发现控制 - 内源性潜在状态：将机器人臂分散注意力（例如，照明条件和背景变化），与其他特工一起探索迷宫，并在Matterport House Simulator中航行。

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题