自动语音处理的深度神经网络：从大型语料库到有限数据的调查

论文标题

自动语音处理的深度神经网络：从大型语料库到有限数据的调查

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

论文作者

Roger, Vincent, Farinas, Jérôme, Pinquier, Julien

论文摘要

大多数最新的语音系统都使用深神经网络（DNN）。这些系统需要大量数据。因此，在资源不足的语音语言/问题上学习最新的框架是一项艰巨的任务。问题可能是语音受损的数据数量有限。此外，获取更多数据和/或专业知识是耗时且昂贵的。在本文中，我们将自己定位为以下语音处理任务：自动语音识别，说话者识别和情感识别。为了评估数据有限的问题，我们首先研究了最新的自动语音识别系统，因为它代表了最困难的任务（由于每种语言的差异很大）。接下来，我们提供需要更少数据的技术和任务的概述。在最后一部分中，我们研究了几乎没有弹药的技术，因为我们将资源不足的语音解释为几个问题。从这个意义上讲，我们提出了在本调查中使用此类技术来解决重点语音问题的少数镜头技术和观点的概述。它发生的是，对大型数据集的审查技术不太适合。然而，文献的一些有希望的结果鼓励使用这种技术进行语音处理。

Most state-of-the-art speech systems are using Deep Neural Networks (DNNs). Those systems require a large amount of data to be learned. Hence, learning state-of-the-art frameworks on under-resourced speech languages/problems is a difficult task. Problems could be the limited amount of data for impaired speech. Furthermore, acquiring more data and/or expertise is time-consuming and expensive. In this paper we position ourselves for the following speech processing tasks: Automatic Speech Recognition, speaker identification and emotion recognition. To assess the problem of limited data, we firstly investigate state-of-the-art Automatic Speech Recognition systems as it represents the hardest tasks (due to the large variability in each language). Next, we provide an overview of techniques and tasks requiring fewer data. In the last section we investigate few-shot techniques as we interpret under-resourced speech as a few-shot problem. In that sense we propose an overview of few-shot techniques and perspectives of using such techniques for the focused speech problems in this survey. It occurs that the reviewed techniques are not well adapted for large datasets. Nevertheless, some promising results from the literature encourage the usage of such techniques for speech processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题