论文标题
非本地音乐统计作为音频到得分钢琴转录的指南
Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription
论文作者
论文摘要
我们提出了一个自动钢琴转录系统,该系统将复音音频录音转换为音乐分数。这是音乐信息处理的长期存在的问题,最近的研究在两种主要组成技术中取得了显着进步:多核检测和节奏量化。在这种情况下,我们研究了一种整合基于深神经网络的多通检测和基于统计模型的节律量化的方法。在第一部分中,我们进行了系统的评估,发现尽管当前的方法在音符水平上达到了高转录精度,但通常会错误地估计音乐的某些全球特征,例如节奏量表,仪表,仪表(时间签名)和条线位置。在第二部分中,我们制定了从音乐知识中得出的音调和节奏内容的非本地统计,并研究了它们在推断这些全球特征方面的效果。我们发现,这些统计数据明显有效地改善了转录结果,它们的最佳组合包括从分离的手部零件获得的统计数据。该集成方法的总体转录错误率为7.1%,而流行的钢琴音乐数据集则具有85.6%的下调F量,并且生成的转录可以部分用于音乐性能和协助人类转录器,从而证明了实用应用的潜力。
We present an automatic piano transcription system that converts polyphonic audio recordings into musical scores. This has been a long-standing problem of music information processing, and recent studies have made remarkable progress in the two main component techniques: multipitch detection and rhythm quantization. Given this situation, we study a method integrating deep-neural-network-based multipitch detection and statistical-model-based rhythm quantization. In the first part, we conducted systematic evaluations and found that while the present method achieved high transcription accuracies at the note level, some global characteristics of music, such as tempo scale, metre (time signature), and bar line positions, were often incorrectly estimated. In the second part, we formulated non-local statistics of pitch and rhythmic contents that are derived from musical knowledge and studied their effects in inferring those global characteristics. We found that these statistics are markedly effective for improving the transcription results and that their optimal combination includes statistics obtained from separated hand parts. The integrated method had an overall transcription error rate of 7.1% and a downbeat F-measure of 85.6% on a dataset of popular piano music, and the generated transcriptions can be partially used for music performance and assisting human transcribers, thus demonstrating the potential for practical applications.