要增加还是不增加？基于运动传感器的用户识别中的数据增强

论文标题

要增加还是不增加？基于运动传感器的用户识别中的数据增强

To augment or not to augment? Data augmentation in user identification based on motion sensors

论文作者

Benegui, Cezara, Ionescu, Radu Tudor

论文摘要

如今，适用于移动设备用户的常用身份验证系统，例如密码检查，面部识别或指纹扫描容易受到各种攻击。为了防止某些可能的攻击，可以通过考虑两因素身份验证方案来增强这些显式身份验证系统，其中第二个因素是基于分析加速度计或陀螺仪捕获的运动传感器数据的隐式身份验证系统。为了避免对用户的额外负担，必须快速执行隐式身份验证系统的注册过程，即从用户收集的数据样本数量通常很少。在设计基于运动信号的隐式用户身份验证的机器学习模型的背景下，数据增强可以发挥重要作用。在本文中，我们研究了几种数据增强技术，以寻求为运动传感器数据找到有用的增强方法。我们在基于运动传感器信号的几个用户识别的背景下，提出了一组与数据增强有关的研究。我们使用两个深度学习体系结构，卷积神经网络和长期的短期内存网络对基准数据集进行实验，以显示哪些数据增强方法可带来准确的提高。有趣的是，我们发现数据增强不是很有帮助，这很可能是因为信号模式可用于区分用户对某些数据增强技术带来的转换过于敏感。这个结果与普遍的信念有些矛盾，即预期数据增加将提高机器学习模型的准确性。

Nowadays, commonly-used authentication systems for mobile device users, e.g. password checking, face recognition or fingerprint scanning, are susceptible to various kinds of attacks. In order to prevent some of the possible attacks, these explicit authentication systems can be enhanced by considering a two-factor authentication scheme, in which the second factor is an implicit authentication system based on analyzing motion sensor data captured by accelerometers or gyroscopes. In order to avoid any additional burdens to the user, the registration process of the implicit authentication system must be performed quickly, i.e. the number of data samples collected from the user is typically small. In the context of designing a machine learning model for implicit user authentication based on motion signals, data augmentation can play an important role. In this paper, we study several data augmentation techniques in the quest of finding useful augmentation methods for motion sensor data. We propose a set of four research questions related to data augmentation in the context of few-shot user identification based on motion sensor signals. We conduct experiments on a benchmark data set, using two deep learning architectures, convolutional neural networks and Long Short-Term Memory networks, showing which and when data augmentation methods bring accuracy improvements. Interestingly, we find that data augmentation is not very helpful, most likely because the signal patterns useful to discriminate users are too sensitive to the transformations brought by certain data augmentation techniques. This result is somewhat contradictory to the common belief that data augmentation is expected to increase the accuracy of machine learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题