论文标题
使用声学呼吸/咳嗽/语音信号的基于监督和自我监管的基于训练预处理的COVID-19检测
Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals
论文作者
论文摘要
在这项工作中,我们建议使用呼吸/语音/咳嗽信号基于双向长期记忆(BILSTM)网络检测方法。通过分别使用声学信号来训练网络,我们可以为三个任务构建单个模型,该任务的平均参数以获得平均模型,然后用作每个任务的Bilstm模型训练的初始化。这种初始化方法可以显着提高三个任务的性能,这超过了官方基线结果。此外,我们还使用公共预培训的Model Wav2Vec2.0,并使用官方的DICOVA数据集预先培训。该WAV2VEC2.0模型用于提取声音的高级特征作为模型输入,以取代常规的MEL频率Cepstral系数(MFCC)功能。实验结果表明,使用高级功能以及MFCC功能可以提高性能。为了进一步提高性能,我们还部署了一些预处理技术,例如无声段删除,振幅归一化和时频掩模。在DICOVA数据集上评估了所提出的检测模型,结果表明,在融合轨道中的盲试验中,我们的方法达到了曲线下的区域(AUC)分数为88.44%。
In this work, we propose a bi-directional long short-term memory (BiLSTM) network based COVID-19 detection method using breath/speech/cough signals. By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task. This initialization method can significantly improve the performance on the three tasks, which surpasses the official baseline results. Besides, we also utilize a public pre-trained model wav2vec2.0 and pre-train it using the official DiCOVA datasets. This wav2vec2.0 model is utilized to extract high-level features of the sound as the model input to replace conventional mel-frequency cepstral coefficients (MFCC) features. Experimental results reveal that using high-level features together with MFCC features can improve the performance. To further improve the performance, we also deploy some preprocessing techniques like silent segment removal, amplitude normalization and time-frequency mask. The proposed detection model is evaluated on the DiCOVA dataset and results show that our method achieves an area under curve (AUC) score of 88.44% on blind test in the fusion track.