埃塞俄比亚圣歌，阿兹马里斯和现代音乐的Kinit分类：新数据集和CNN基准测试

论文标题

埃塞俄比亚圣歌，阿兹马里斯和现代音乐的Kinit分类：新数据集和CNN基准测试

Kinit Classification in Ethiopian Chants, Azmaris and Modern Music: A New Dataset and CNN Benchmark

论文作者

Retta, Ephrem A., Sutcliffe, Richard, Almekhlafi, Eiad, Enku, Yosef K., Alemu, Eyob, Gemechu, Tigist D., Berwo, Michael A., Mhamed, Mustafa, Feng, Jun

论文摘要

在本文中，我们创建了埃米尔（Emir），这是有史以来第一个用于埃塞俄比亚音乐的音乐信息检索数据集。 Emir可自由用于研究目的，并包含600个正统Tewahedo颂歌，传统Azmari歌曲和当代埃塞俄比亚世俗音乐的样本录音。每个样本由五位专家法官归类为四个著名的埃塞俄比亚Kinits之一Tizita，Bati，Ambassel和Anchihoye。每个Kinit都使用自己的五音阶量表，并具有自己的风格特征。因此，Kinit分类需要将量表识别与流派识别相结合。在描述数据集后，我们根据VGG介绍Ethio Kinits模型（EKM），以分类Emir剪辑。在实验1中，我们研究了FilterBank，Mel-Spectrogragry，Chroma或Mel-Frice-Fly-Fly-Fly-Fly-Fly-Fly-Fly-Fly-Fly-Fcccstral系数（MFCC）功能最适合使用EKM分类。发现MFCC是优越的，因此在实验2中采用了MFCC，其中使用三种不同的音频样品长度比较了使用MFCC的EKM模型的性能。 3s长度给出了最好的结果。在实验3中，在Emir数据集上比较了EKM和四个现有模型：Alexnet，Resnet50，VGG16和LSTM。发现EKM具有最佳准确性（95.00％），并且训练时间最快。我们希望这项工作将鼓励其他人探索埃塞俄比亚音乐，并尝试其他模型进行Kinit分类。

In this paper, we create EMIR, the first-ever Music Information Retrieval dataset for Ethiopian music. EMIR is freely available for research purposes and contains 600 sample recordings of Orthodox Tewahedo chants, traditional Azmari songs and contemporary Ethiopian secular music. Each sample is classified by five expert judges into one of four well-known Ethiopian Kinits, Tizita, Bati, Ambassel and Anchihoye. Each Kinit uses its own pentatonic scale and also has its own stylistic characteristics. Thus, Kinit classification needs to combine scale identification with genre recognition. After describing the dataset, we present the Ethio Kinits Model (EKM), based on VGG, for classifying the EMIR clips. In Experiment 1, we investigated whether Filterbank, Mel-spectrogram, Chroma, or Mel-frequency Cepstral coefficient (MFCC) features work best for Kinit classification using EKM. MFCC was found to be superior and was therefore adopted for Experiment 2, where the performance of EKM models using MFCC was compared using three different audio sample lengths. 3s length gave the best results. In Experiment 3, EKM and four existing models were compared on the EMIR dataset: AlexNet, ResNet50, VGG16 and LSTM. EKM was found to have the best accuracy (95.00%) as well as the fastest training time. We hope this work will encourage others to explore Ethiopian music and to experiment with other models for Kinit classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题