论文标题
史诗TTS模型:表征文本到语音模型的经验修剪调查
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
论文作者
论文摘要
已知神经模型被过度参数化,最近的工作表明,稀疏的文本到语音(TTS)模型可以超过密集的模型。尽管已经为其他域提出了大量稀疏方法,但这种方法很少在TTS中应用。在这项工作中,我们试图回答以下问题:所选稀疏技术在性能和模型复杂性上的特征是什么?我们比较了Tacotron2基线和应用五种技术的结果。然后,我们通过自然性,清晰度和韵律来评估表现,同时报告模型规模和训练时间。与先前的研究相辅相成,我们发现在训练之前或期间进行修剪可以实现与训练后的修剪相似的性能,并且可以更快地训练,同时消除整个神经元的降低性能远不止于删除参数。据我们所知,这是比较文本到语音综合中稀疏范式的第一部作品。
Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model complexity? We compare a Tacotron2 baseline and the results of applying five techniques. We then evaluate the performance via the factors of naturalness, intelligibility and prosody, while reporting model size and training time. Complementary to prior research, we find that pruning before or during training can achieve similar performance to pruning after training and can be trained much faster, while removing entire neurons degrades performance much more than removing parameters. To our best knowledge, this is the first work that compares sparsity paradigms in text-to-speech synthesis.