论文标题
游戏中专家级音乐到舞蹈翻译的半监督学习
Semi-Supervised Learning for In-Game Expert-Level Music-to-Dance Translation
论文作者
论文摘要
音乐到舞蹈翻译是最近角色扮演游戏中的全新且有力的功能。现在,玩家可以让自己的角色与指定的音乐剪辑一起跳舞,甚至可以制作粉丝制作的舞蹈视频。该主题的先前作品将音乐与舞蹈视为基于时间序列数据的监督运动生成问题。但是,这些方法受到训练数据对有限和运动降解的影响。本文为这项任务提供了一种新的视角,我们将翻译问题重新构建为基于编排理论的舞蹈词组检索问题。通过这样的设计,允许玩家进一步编辑我们这一代之上的舞蹈运动,而其他基于回归的方法则忽略了这种用户的交互性。考虑到舞蹈运动捕获是一个昂贵且耗时的过程,需要专业舞者的帮助,因此我们在半监督的学习框架下训练我们的方法,并收集了一个大型未标记的数据集(20倍)收集的大型数据集(20倍)。引入了一种共同的机制来提高我们网络的鲁棒性。使用这个未标记的数据集,我们还引入了自我监督的预训练,以便翻译人员可以理解音乐短语的旋律,节奏和其他组件。我们表明,与从头开始的培训相比,预训练明显提高了翻译精度。实验结果表明,我们的方法不仅在各种风格的音乐中都可以很好地推广,而且在游戏玩家的专家级编舞中成功。
Music-to-dance translation is a brand-new and powerful feature in recent role-playing games. Players can now let their characters dance along with specified music clips and even generate fan-made dance videos. Previous works of this topic consider music-to-dance as a supervised motion generation problem based on time-series data. However, these methods suffer from limited training data pairs and the degradation of movements. This paper provides a new perspective for this task where we re-formulate the translation problem as a piece-wise dance phrase retrieval problem based on the choreography theory. With such a design, players are allowed to further edit the dance movements on top of our generation while other regression based methods ignore such user interactivity. Considering that the dance motion capture is an expensive and time-consuming procedure which requires the assistance of professional dancers, we train our method under a semi-supervised learning framework with a large unlabeled dataset (20x than labeled data) collected. A co-ascent mechanism is introduced to improve the robustness of our network. Using this unlabeled dataset, we also introduce self-supervised pre-training so that the translator can understand the melody, rhythm, and other components of music phrases. We show that the pre-training significantly improves the translation accuracy than that of training from scratch. Experimental results suggest that our method not only generalizes well over various styles of music but also succeeds in expert-level choreography for game players.