论文标题
COVID-19
A large-scale and PCR-referenced vocal audio dataset for COVID-19
论文作者
论文摘要
英国COVID-19人声音频数据集专为使用声音音频对SARS-COV-2感染状态或相关呼吸症状进行分类的机器学习模型的培训和评估设计。英国卫生安全局在2021年3月至2022年3月在英格兰举行的《国家测试与跟踪计划》以及在Alpha和Delta Sars-Cov-2变体的主要传播期间,通过国家测试和TRACE计划招募了自愿参与者。在“演讲”中收集了自愿咳嗽,呼气和语音的音频记录,以帮助赢得冠状病毒的数字调查以及人口统计学,自我报告的症状和呼吸状况数据,并与SARS-COV-2测试结果相关。英国COVID-19人声音频数据集代表了迄今为止最大的SARS-COV-2 PCR参考音频录制集合。 PCR结果与72,999名参与者中的70,794个相关,25,776个阳性病例中有24,155。 45.62%的参与者报告了呼吸道症状。该数据集在生物源研究中还具有其他潜在用途,参与者报告哮喘的参与者为11.30%,有链接的流感PCR测试结果为27.20%。
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.