论文标题

利用事后上下文,以便在强盗设置中使用机器人辅助喂养中的应用程序进行更快的学习

Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding

论文作者

Gordon, Ethan K., Roychowdhury, Sumegh, Bhattacharjee, Tapomayukh, Jamieson, Kevin, Srinivasa, Siddhartha S.

论文摘要

自主机器人辅助喂养需要获得各种食品的能力。但是,这种系统不可能接受存在所有类型的食物的培训。因此,一个关键的挑战是为以前看不见的食品选择一种操纵策略。先前的工作表明,问题可以表示为具有视觉上下文的线性匪徒。但是,食物具有与操纵相关的多种多模式特性,这些特性很难在视觉上区分。我们的主要见解是,我们可以利用操纵期间和之后收集的触觉上下文(即“事后”)学习其中的一些属性,并更快地使我们的视觉模型适应以前看不见的食物。通常,我们提出了一个经过修改的线性上下文匪徒框架,并随着行动选择后观察到的事后环境增强,以提高学习速度并减少累积遗憾。关于合成数据的实验表明,当上下文的维度相对于事后上下文或事后事后上下文模型特别易于学习时,这种效果更为明显。最后,我们将此框架应用于咬合的收购问题,并证明了在64次尝试中遭受的8种以前未见类型的食物,而失败减少了21%。

Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. However, it is impossible for such a system to be trained on all types of food in existence. Therefore, a key challenge is choosing a manipulation strategy for a previously unseen food item. Previous work showed that the problem can be represented as a linear bandit with visual context. However, food has a wide variety of multi-modal properties relevant to manipulation that can be hard to distinguish visually. Our key insight is that we can leverage the haptic context we collect during and after manipulation (i.e., "post hoc") to learn some of these properties and more quickly adapt our visual model to previously unseen food. In general, we propose a modified linear contextual bandit framework augmented with post hoc context observed after action selection to empirically increase learning speed and reduce cumulative regret. Experiments on synthetic data demonstrate that this effect is more pronounced when the dimensionality of the context is large relative to the post hoc context or when the post hoc context model is particularly easy to learn. Finally, we apply this framework to the bite acquisition problem and demonstrate the acquisition of 8 previously unseen types of food with 21% fewer failures across 64 attempts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源