用蒙特卡洛树搜索控制符号音乐的感知情绪

论文标题

用蒙特卡洛树搜索控制符号音乐的感知情绪

Controlling Perceived Emotion in Symbolic Music Generation with Monte Carlo Tree Search

论文作者

Ferreira, Lucas N., Mou, Lili, Whitehead, Jim, Lelis, Levi H. S.

论文摘要

本文提出了一种通过蒙特卡洛树搜索来控制象征性音乐的新方法。我们使用蒙特卡洛树搜索作为一种解码机制来指导语言模型学到的概率分布转向给定的情感。在解码过程的每个步骤中，我们都会使用树木（Puct）的预测指标上的置信度来搜索分别由情绪分类器和歧视器给出的情感和质量平均值的序列。我们将语言模型用作斗篷的策略，情感分类器和歧视者的组合作为其价值功能。为了解码一段音乐中的下一个令牌，我们从搜索过程中创建的节点访问的分布中进行了采样。我们使用直接从生成的样品计算的一组客观指标来评估生成样品相对于人工组成的碎片的质量。我们还进行了一项用户研究，以评估人类受试者如何看待产生的样本的质量和情感。我们将派斗与随机双目标梁搜索（SBB）和条件采样（CS）进行了比较。结果表明，在音乐质量和情感的几乎所有指标中，Puct的表现都优于SBB和CS。

This paper presents a new approach for controlling emotion in symbolic music generation with Monte Carlo Tree Search. We use Monte Carlo Tree Search as a decoding mechanism to steer the probability distribution learned by a language model towards a given emotion. At every step of the decoding process, we use Predictor Upper Confidence for Trees (PUCT) to search for sequences that maximize the average values of emotion and quality as given by an emotion classifier and a discriminator, respectively. We use a language model as PUCT's policy and a combination of the emotion classifier and the discriminator as its value function. To decode the next token in a piece of music, we sample from the distribution of node visits created during the search. We evaluate the quality of the generated samples with respect to human-composed pieces using a set of objective metrics computed directly from the generated samples. We also perform a user study to evaluate how human subjects perceive the generated samples' quality and emotion. We compare PUCT against Stochastic Bi-Objective Beam Search (SBBS) and Conditional Sampling (CS). Results suggest that PUCT outperforms SBBS and CS in almost all metrics of music quality and emotion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题