论文标题
基于力矩的对抗性培训用于具体语言理解
Moment-based Adversarial Training for Embodied Language Comprehension
论文作者
论文摘要
在本文中,我们专注于一项视觉和语言任务,其中指示机器人执行家庭任务。考虑到“冲洗杯子并将其放在咖啡机中”之类的指示,机器人需要找到杯子,洗涤并将其放入咖啡机中。这是具有挑战性的,因为机器人需要将指令句子分解为子目标并以正确的顺序执行它们。在阿尔弗雷德(Alfred)基准上,最先进方法的性能仍然远低于人类。这部分是因为现有方法有时无法推断出在指令句子中未明确指定的子观念。我们建议基于力矩的对抗训练(MAT),该训练使用两种类型的时刻进行对抗训练中的扰动更新。我们将MAT介绍给指令,子目标和状态表示的嵌入空间,以处理其品种。我们在Alfred基准测试上验证了我们的方法,结果表明我们的方法的表现优于基准上所有指标的基线方法。
In this paper, we focus on a vision-and-language task in which a robot is instructed to execute household tasks. Given an instruction such as "Rinse off a mug and place it in the coffee maker," the robot is required to locate the mug, wash it, and put it in the coffee maker. This is challenging because the robot needs to break down the instruction sentences into subgoals and execute them in the correct order. On the ALFRED benchmark, the performance of state-of-the-art methods is still far lower than that of humans. This is partially because existing methods sometimes fail to infer subgoals that are not explicitly specified in the instruction sentences. We propose Moment-based Adversarial Training (MAT), which uses two types of moments for perturbation updates in adversarial training. We introduce MAT to the embedding spaces of the instruction, subgoals, and state representations to handle their varieties. We validated our method on the ALFRED benchmark, and the results demonstrated that our method outperformed the baseline method for all the metrics on the benchmark.