论文标题
使用系数映射和神经网络的语音转换
Voice conversion using coefficient mapping and neural network
论文作者
论文摘要
该研究使用系数映射和神经网络提出了语音转换模型。关于参数语音综合的大多数以前的作品都没有说明频谱细节中引起的损失,导致平滑和始终是对目标扬声器转换的语音的明显偏差。在这项工作中开发了一种使用线性预测编码(LPC)和线光谱频率(LSF)系数来参数化源语音信号的改进模型,以揭示过度光滑的效果。神经网络的非线性映射能力用于将源语音向量映射到目标的声学矢量空间中。由于LPC滤清器的不稳定性,训练LPC系数的神经网络系数产生了较差的结果。在用3层神经网络训练之前,将LPC系数转换为线光谱频率系数。使用噪声数据对算法进行了测试,并使用Mel-Cyptral距离测量进行了评估。 Cepstral距离评估显示,目标与转换的语音之间的光谱距离降低了35.7%。
The research presents a voice conversion model using coefficient mapping and neural network. Most previous works on parametric speech synthesis did not account for losses in spectral details causing over smoothing and invariably, an appreciable deviation of the converted speech from the targeted speaker. An improved model that uses both linear predictive coding (LPC) and line spectral frequency (LSF) coefficients to parametrize the source speech signal was developed in this work to reveal the effect of over-smoothing. Non-linear mapping ability of neural network was employed in mapping the source speech vectors into the acoustic vector space of the target. Training LPC coefficients with neural network yielded a poor result due to the instability of the LPC filter poles. The LPC coefficients were converted to line spectral frequency coefficients before been trained with a 3-layer neural network. The algorithm was tested with noisy data with the result evaluated using Mel-Cepstral Distance measurement. Cepstral distance evaluation shows a 35.7 percent reduction in the spectral distance between the target and the converted speech.