论文标题
原始版:聋哑用户的个性化且可扩展的声音识别系统
ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users
论文作者
论文摘要
最近的进步使移动设备上的聋哑和听力难(DHH)用户实现了自动声音识别系统。但是,这些工具使用预先训练的通用声音识别模型,这些模型无法满足DHH用户的不同需求。我们介绍了ProtoSound,这是一种通过录制一些示例来定制声音识别模型的交互式系统,从而实现了个性化和细粒度的类别。原始功能是由先前的工作来探讨DHH人的声音意识需求,以及我们与472 DHH参与者进行的调查。为了评估原始功能,我们表征了两个现实世界声音数据集上的性能,显示出比最先进的(例如,第一个数据集中的 +9.7%精度)的显着改善。然后,我们通过移动应用程序部署了ProtoSound的最终用户培训和实时识别,并招募了19名听力参与者,他们听取了现实世界的声音,并评估了56个地点(例如房屋,餐馆,公园)的准确性。结果表明,原创性凭借实时的模型在设备上个性化,并且在各种声学环境中精确学习的声音。我们通过讨论具有个性化的声音识别的公开挑战来结束,包括需要更好地录制接口和算法改进。
Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH people and by a survey we conducted with 472 DHH participants. To evaluate ProtoSound, we characterized performance on two real-world sound datasets, showing significant improvement over state-of-the-art (e.g., +9.7% accuracy on the first dataset). We then deployed ProtoSound's end-user training and real-time recognition through a mobile application and recruited 19 hearing participants who listened to the real-world sounds and rated the accuracy across 56 locations (e.g., homes, restaurants, parks). Results show that ProtoSound personalized the model on-device in real-time and accurately learned sounds across diverse acoustic contexts. We close by discussing open challenges in personalizable sound recognition, including the need for better recording interfaces and algorithmic improvements.