C-SL：与惯性声传感器的对比性声音定位

论文标题

C-SL：与惯性声传感器的对比性声音定位

C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors

论文作者

Mirbagheri, Majid, Doosti, Bardia

论文摘要

人脑采用有关头和眼动的知觉信息来更新个体与周围环境之间的空间关系。基于这种称为空间更新的认知过程，我们使用任意几何形状的移动惯性声传感器阵列引入了对比度声音定位（C-SL）。 C-SL使用未标记的多通道音频记录和惯性测量单元（IMU）读数在阵列的自由旋转运动中收集的读数，以以自我征收的方式学习从声学测量到以阵列为中心的阵列方向（DOA）的映射。与传统的DOA估计方法相反，这些方法需要在校准阶段了解阵列几何或源位置的知识，C-SL对两者都是不可知的，并且可以在最小约束的设置中对数据进行培训。为了实现此能力，我们提出的方法利用了定制的对比损失，测量了预测输入的分离片段之间的空间对比度，以在线性时间内共同更新估计的DOA和声学空间映射。我们提供了C-SL的定量和定性评估，将其性能与基线DOA估计方法进行了广泛的条件进行了比较。我们认为，C-SL提供的轻松校准过程为真正个性化的增强听力应用程序提供了道路。

Human brain employs perceptual information about the head and eye movements to update the spatial relationship between the individual and the surrounding environment. Based on this cognitive process known as spatial updating, we introduce contrastive sound localization (C-SL) with mobile inertial-acoustic sensor arrays of arbitrary geometry. C-SL uses unlabeled multi-channel audio recordings and inertial measurement unit (IMU) readings collected during free rotational movements of the array to learn mappings from acoustical measurements to an array-centered direction-of-arrival (DOA) in a self-supervised manner. Contrary to conventional DOA estimation methods that require the knowledge of either the array geometry or source locations in the calibration stage, C-SL is agnostic to both, and can be trained on data collected in minimally constrained settings. To achieve this capability, our proposed method utilizes a customized contrastive loss measuring the spatial contrast between source locations predicted for disjoint segments of the input to jointly update estimated DOAs and the acoustic-spatial mapping in linear time. We provide quantitative and qualitative evaluations of C-SL comparing its performance with baseline DOA estimation methods in a wide range of conditions. We believe the relaxed calibration process offered by C-SL paves the way toward truly personalized augmented hearing applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题