论文标题
与线性内存的精确,可行的动态时间扭曲对齐
Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory
论文作者
论文摘要
音频对齐是许多MIR管道中的基本预处理步骤。对于具有M和N框架的两个音频剪辑,最受欢迎的方法是动态时间翘曲(DTW),在内存和计算中都具有O(MN)的要求,这对于以合理的速度对框架级别的对准而言非常敏感。为了解决这个问题,存在多种内存有效算法,以近似DTW成本下的最佳对齐。但是,据我们所知,没有确切的算法可以保证打破二次记忆障碍。在这项工作中,我们提出了一种划分和征服算法,该算法使用O(M+N)存储器计算确切的全球最佳DTW对齐。它的运行时间仍然是O(MN),将内存交易以增加2倍的计算增加。但是,该算法可以并行地将具有相同内存约束的最小值(m,n)并行,因此与具有足够GPU的教科书版本相比,它仍然可以更有效地运行。我们使用算法来计算一系列管弦乐音乐的确切对准,我们将其用作地面真理,以基准在以前无法使用的尺度上几个流行的几种近似对齐方案的对齐准确性。
Audio alignment is a fundamental preprocessing step in many MIR pipelines. For two audio clips with M and N frames, respectively, the most popular approach, dynamic time warping (DTW), has O(MN) requirements in both memory and computation, which is prohibitive for frame-level alignments at reasonable rates. To address this, a variety of memory efficient algorithms exist to approximate the optimal alignment under the DTW cost. To our knowledge, however, no exact algorithms exist that are guaranteed to break the quadratic memory barrier. In this work, we present a divide and conquer algorithm that computes the exact globally optimal DTW alignment using O(M+N) memory. Its runtime is still O(MN), trading off memory for a 2x increase in computation. However, the algorithm can be parallelized up to a factor of min(M, N) with the same memory constraints, so it can still run more efficiently than the textbook version with an adequate GPU. We use our algorithm to compute exact alignments on a collection of orchestral music, which we use as ground truth to benchmark the alignment accuracy of several popular approximate alignment schemes at scales that were not previously possible.