自动Transrl：机器人感知的视觉管道的自主组成

论文标题

自动Transrl：机器人感知的视觉管道的自主组成

Auto-TransRL: Autonomous Composition of Vision Pipelines for Robotic Perception

论文作者

Kapoor, Aditya, George, Nijil, Sengar, Vartika, Vatsal, Vighnesh, Gubbi, Jayavardhana

论文摘要

为不同数据集创建视力管道来解决计算机视觉任务是一个复杂且耗时的过程。目前，这些管道是在域专家的帮助下开发的。此外，除了依靠经验，反复试验或使用基于模板的方法外，没有系统的结构来构建视觉管道。由于选择合适的算法来实现特定视觉任务的搜索空间是大型的人类探索，以找到良好的解决方案需要时间和精力。为了解决以下问题，我们提出了一种动态和数据驱动的方式来确定一组适当的算法，该算法适合于建立视觉管道以实现目标任务。我们介绍了一个具有深度强化学习的变压器体系结构，以推荐可以在视觉工作流程的不同阶段合并的算法。该系统既强大又适应环境的动态变化。实验结果进一步表明，我们的方法还很好地推荐了训练时未使用的算法，因此减轻了在测试期间引入的新一组算法上重新训练系统的需求。

Creating a vision pipeline for different datasets to solve a computer vision task is a complex and time consuming process. Currently, these pipelines are developed with the help of domain experts. Moreover, there is no systematic structure to construct a vision pipeline apart from relying on experience, trial and error or using template-based approaches. As the search space for choosing suitable algorithms for achieving a particular vision task is large, human exploration for finding a good solution requires time and effort. To address the following issues, we propose a dynamic and data-driven way to identify an appropriate set of algorithms that would be fit for building the vision pipeline in order to achieve the goal task. We introduce a Transformer Architecture complemented with Deep Reinforcement Learning to recommend algorithms that can be incorporated at different stages of the vision workflow. This system is both robust and adaptive to dynamic changes in the environment. Experimental results further show that our method also generalizes well to recommend algorithms that have not been used while training and hence alleviates the need of retraining the system on a new set of algorithms introduced during test time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题