论文标题
使用暹罗网络进行音频到得分对齐的学习框架相似性
Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment
论文作者
论文摘要
音频到得分对齐旨在在性能音频和给定曲目的分数之间产生准确的映射。标准比对方法基于动态时间扭曲(DTW)和采用手工制作的功能,无法适应不同的声学条件。我们提出了一种使用学到的框架相似性来克服这一限制的方法,以进行音频到得分对齐。我们专注于钢琴音乐的离线音频到得分对齐。来自不同声学条件的音乐数据的实验表明,与使用手工制作的特征的标准基于DTW的方法相比,我们的方法具有更高的对齐精度,并且可以同时适用于不同的域,并且可以生成健壮的对齐。
Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio-to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time.