掌握基于强化学习的人类机器人协作组装中的工作顺序

论文标题

掌握基于强化学习的人类机器人协作组装中的工作顺序

Mastering the working sequence in human-robot collaborative assembly based on reinforcement learning

论文作者

Yu, Tian, Huang, Jing, Chang, Qing

论文摘要

人机合作（HRC）在制造系统中的长期目标是提高协作工作效率。根据工业4.0构建智能制造系统的趋势，HRC系统中的联合演奏者应该得到更好的设计，以更加自组织，并通过自学习来找到超人的熟练程度。受到Google Deep Hind（如Alphago Zero）开发的令人印象深刻的机器学习算法的启发，在本文中，人机协作组装工作过程格式化为棋盘格式，而棋盘中的动作选择用于将人类和机器人在HRC组装工作过程中的决策制定。为了获得最大程度地提高工作效率的工作顺序的最佳政策，该机器人通过基于强化学习的自我播放算法进行培训，而无需超出游戏规则的指导或领域知识。还对神经网络进行了训练，以预测移动选择优先级的分布，以及工作序列是否是导致HRC效率最大的一个。可调节的桌子组件用于演示拟议的HRC组件算法及其效率。

A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine learning algorithms developed by Google Deep Mind like Alphago Zero, in this paper, the human-robot collaborative assembly working process is formatted into a chessboard and the selection of moves in the chessboard is used to analogize the decision making by both human and robot in the HRC assembly working process. To obtain the optimal policy of working sequence to maximize the working efficiency, the robot is trained with a self-play algorithm based on reinforcement learning, without guidance or domain knowledge beyond game rules. A neural network is also trained to predict the distribution of the priority of move selections and whether a working sequence is the one resulting in the maximum of the HRC efficiency. An adjustable desk assembly is used to demonstrate the proposed HRC assembly algorithm and its efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题