AM-Mobilenet1d：一种用于扬声器识别的便携式型号

论文标题

AM-Mobilenet1d：一种用于扬声器识别的便携式型号

AM-MobileNet1D: A Portable Model for Speaker Recognition

论文作者

Nunes, João Antônio Chagas, Macêdo, David, Zanchettin, Cleber

论文摘要

发言人的认可和说话者的身份是具有挑战性的任务，具有自动化，身份验证和安全性等基本应用。诸如Sincnet和Am-Sincnet之类的深度学习方法对这些任务呈现出色。有希望的性能将这些模型带到了现实世界中的应用程序，这些应用程序从根本上成为最终用户驱动的，并且主要是移动的。移动计算需要降低存储尺寸，非处理和内存密集和高效的能源消费的应用程序。相比之下，深度学习方法通常是能量昂贵的，苛刻的存储，处理能力和记忆。为了满足这一需求，我们提出了一个可移植的模型，称为添加节利润率mobilenet1d（AM-MobiLenet1d），以便在移动设备上识别说话者的识别。我们评估了在TIMIT和MIT数据集上的建议方法，该方法获得了有关基线方法的等效性或更好的性能。此外，提出的模型仅在磁盘存储中仅需11.6兆字节，而SINCNET和AM-SINCNET体系结构的91.2只需花费91.2，这使该模型更快七倍，参数减少了八倍。

Speaker Recognition and Speaker Identification are challenging tasks with essential applications such as automation, authentication, and security. Deep learning approaches like SincNet and AM-SincNet presented great results on these tasks. The promising performance took these models to real-world applications that becoming fundamentally end-user driven and mostly mobile. The mobile computation requires applications with reduced storage size, non-processing and memory intensive and efficient energy-consuming. The deep learning approaches, in contrast, usually are energy expensive, demanding storage, processing power, and memory. To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices. We evaluated the proposed approach on TIMIT and MIT datasets obtaining equivalent or better performances concerning the baseline methods. Additionally, the proposed model takes only 11.6 megabytes on disk storage against 91.2 from SincNet and AM-SincNet architectures, making the model seven times faster, with eight times fewer parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题