使用深度学习和紧凑的表示从扫描的头部几何形状预测全局HRTF

论文标题

使用深度学习和紧凑的表示从扫描的头部几何形状预测全局HRTF

Predicting Global HRTFs From Scanned Head Geometry Using Deep Learning and Compact Representations

论文作者

Wang, Yuxiang, Zhang, You, Duan, Zhiyao, Bocko, Mark

论文摘要

在虚拟听觉显示的不断增长的领域中，个性化与头部相关的转移功能（HRTFS）在建立混合和增强现实应用程序的准确声音图像中起着至关重要的作用。在这项工作中，我们提出了一种使用卷积神经网络（CNN）的HRTF个性化方法，以从其扫描的头部几何形状中预测所有方向的受试者HRTF。为了简化CNN模型的训练，我们提出了针对头部扫描和HRTF数据的新型预处理方法，以实现紧凑的表示。对于头部扫描，我们使用截短的球形帽谐波（SCH）系数来表示Pinna区域，这在声学散射过程中很重要。对于HRTF数据，我们使用截短的球形谐波（SH）系数来表示HRTF的幅度和启动。对一个CNN模型进行了训练，以预测扫描的耳朵几何形状和其他人体测量值的SCH系数中HRTF幅度的SH系数。训练了另一个CNN模型，以预测仅从耳朵，头部和躯干的人体测量值的HRTF ONETS的SH系数。结合了大小和发作预测，我们的方法能够预测完整和全局的HRTF数据。对数 - 元素失真（LSD）度量的遗留验证用于客观评估。结果表明，与地面真相HRTF相比，在空间\和时间尺寸上都具有不错的LSD水平，而LSD则比数据库提供的HRTF的边界元素方法（BEM）模拟较低。带有听觉模型的本地化仿真结果也与客观评估指标一致，显示了我们预测的HRTF的定位响应明显好于与BEM计算的响应。

In the growing field of virtual auditory display, personalized head-related transfer functions (HRTFs) play a vital role in establishing an accurate sound image for mixed and augmented reality applications. In this work, we propose an HRTF personalization method employing convolutional neural networks (CNN) to predict a subject HRTFs for all directions from their scanned head geometry. To ease the training of the CNN models, we propose novel pre-processing methods for both the head scans and HRTF data to achieve compact representations. For the head scan, we use truncated spherical cap harmonic (SCH) coefficients to represent the pinna area, which is important in the acoustic scattering process. For the HRTF data, we use truncated spherical harmonic (SH) coefficients to represent the HRTF magnitudes and onsets. One CNN model is trained to predict the SH coefficients of the HRTF magnitudes from the SCH coefficients of the scanned ear geometry and other anthropometric measurements of the head. The other CNN model is trained to predict SH coefficients of the HRTF onsets from only the anthropometric measurements of the ear, head, and torso. Combining the magnitude and onset predictions, our method is able to predict the complete and global HRTF data. A leave-one-out validation with the log-spectral distortion (LSD) metric is used for objective evaluation. The results show a decent LSD level at both spatial \& temporal dimensions compared to the ground-truth HRTFs and a lower LSD than the boundary element method (BEM) simulation of HRTFs that the database provides. The localization simulation results with an auditory model are also consistent with the objective evaluation metrics, showing the localization responses with our predicted HRTFs are significantly better than with the BEM-calculated ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题