论文标题
迈向表现力的语音纠正:关于评估指标的知觉有效性的人声旋律提取
Toward Expressive Singing Voice Correction: On Perceptual Validity of Evaluation Metrics for Vocal Melody Extraction
论文作者
论文摘要
Singing Voice Correction(SVC)是业余歌手的吸引人申请。商业产品通过将音高轮廓捕集到相等的尺度来使SVC自动化,这可能会导致死pan的修改。加上对节奏错误的忽视,仍然需要大量的手动校正。在本文中,我们提出了一个简化的系统,以使音调和节奏误差都自动化表达性SVC。特别是,我们通过整合用于唱歌语音分离(SV)和人声旋律提取的先进技术来扩展以前的工作。 SVC是通过暂时对齐源目标对实现的,然后用目标替换源的音高和节奏。我们通过一项比较研究评估框架的旋律提取,该旋律提取涉及主观和客观评估,从而通过SVC的镜头研究标准指标的知觉有效性。结果表明,指标获得的高音高准确度并不表示良好的感知得分。
Singing voice correction (SVC) is an appealing application for amateur singers. Commercial products automate SVC by snapping pitch contours to equal-tempered scales, which could lead to deadpan modifications. Together with the neglect of rhythmic errors, extensive manual corrections are still necessary. In this paper, we present a streamlined system to automate expressive SVC for both pitch and rhythmic errors. Particularly, we extend a previous work by integrating advanced techniques for singing voice separation (SVS) and vocal melody extraction. SVC is achieved by temporally aligning the source-target pair, followed by replacing pitch and rhythm of the source with those of the target. We evaluate the framework by a comparative study for melody extraction which involves both subjective and objective evaluations, whereby we investigate perceptual validity of the standard metrics through the lens of SVC. The results suggest that the high pitch accuracy obtained by the metrics does not signify good perceptual scores.