论文标题
从粗频谱特征推断出音高
Inferring Pitch from Coarse Spectral Features
论文作者
论文摘要
长期以来,基本频率(F0)一直被视为语音分析中“音调”的物理定义。但是有许多演示表明F0充其量是在生产和感知中的近似值:俯仰不是F0,而F0不是音高。音高的变化涉及许多发音和声学协变量。音高感知通常会偏离F0分析预测的内容。实际上,来自单个语音源的准周期信号通常并不完全以试图定义单个时间变化的F0的特征。在本文中,我们在相对粗糙的光谱方面发现了对俯仰的协变量的强烈支持,在该光谱中没有销售序列。因此,线性回归可以预测由关节合成器或人类产生的简单发声的音调,从这种粗谱的单帧中。在扬声器和更复杂的发声中,我们的实验表明,协变量并不那么简单,尽管显然仍然可用于更复杂的建模。在此基础上,我们建议该领域需要一种更好的思考语音音调的方式,就像天体力学要求我们超越牛顿的点质量近似与天体一样。
Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis predicts; and in fact, quasi-periodic signals from a single voice source are often incompletely characterized by an attempt to define a single time-varying F0. In this paper, we find strong support for the existence of covariates for pitch in aspects of relatively coarse spectra, in which an overtone series is not available. Thus linear regression can predict the pitch of simple vocalizations, produced by an articulatory synthesizer or by human, from single frames of such coarse spectra. Across speakers, and in more complex vocalizations, our experiments indicate that the covariates are not quite so simple, though apparently still available for more sophisticated modeling. On this basis, we propose that the field needs a better way of thinking about speech pitch, just as celestial mechanics requires us to go beyond Newton's point mass approximations to heavenly bodies.