论文标题

部分可观测时空混沌系统的无模型预测

Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

论文作者

Singh, Anup, Demuynck, Kris, Arora, Vipul

论文摘要

音频指纹系统必须在广泛的数据库中有效,稳健地识别查询片段。为此,最先进的系统使用深度学习来产生紧凑的音频指纹。这些系统部署了索引方法,该方法以无监督的方式将指纹量化为哈希码以加快搜索。但是,这些方法会产生不平衡的哈希码,从而导致其次优性能。因此,我们提出一个自我监督的学习框架,以端到端的方式计算指纹和平衡的哈希码,以实现快速而准确的检索性能。我们将哈希代码建模为平衡的聚类过程,我们认为这是最佳运输问题的实例。实验结果表明,与竞争方法相比,所提出的方法提高了检索效率的同时,尤其是在高扭曲水平上,尤其是在高扭曲水平上。此外,我们的系统在计算负载和内存存储中是有效且可扩展的。

Audio fingerprinting systems must efficiently and robustly identify query snippets in an extensive database. To this end, state-of-the-art systems use deep learning to generate compact audio fingerprints. These systems deploy indexing methods, which quantize fingerprints to hash codes in an unsupervised manner to expedite the search. However, these methods generate imbalanced hash codes, leading to their suboptimal performance. Therefore, we propose a self-supervised learning framework to compute fingerprints and balanced hash codes in an end-to-end manner to achieve both fast and accurate retrieval performance. We model hash codes as a balanced clustering process, which we regard as an instance of the optimal transport problem. Experimental results indicate that the proposed approach improves retrieval efficiency while preserving high accuracy, particularly at high distortion levels, compared to the competing methods. Moreover, our system is efficient and scalable in computational load and memory storage.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源