论文标题

韵律节奏比较的一种简单特征方法

A Simple Feature Method for Prosody Rhythm Comparison

论文作者

Julião, Mariana, Abad, Alberto, Moniz, Helena

论文摘要

在韵律的所有组成部分中,节奏被认为是最难解决的,因为它与音高和强度完全相关。然而,节奏是一个很好的指标,表明说话者在外语甚至某些疾病中流利。测量节奏的规范方法,例如$ΔC$或$ \%v $,涉及一个繁琐的细分一致过程,通常会导致适度且可疑的结果。然而,从感知上讲,节奏听起来并不那么困难,即使文本不完全理解,人类也可以掌握它。在这项工作中,我们开发了一种经验和无监督的节奏评估方法,该方法不依赖于内容。我们创建了每个话语,峰嵌入(PE)的固定长度表示,该表示将所选低级描述符的峰之间的比例距离编码。类似小句子的单元的聚类对,使用PE的轮廓系数达到0.444的平均值为0.444,全球可分离性指数与PE和PE的结合与俯仰和响度的结合为0.979。聚类相同的结构单词,我们的剪影系数平均值为0.196,响度为PE的全局可分离性指数为0.864。

Of all components of Prosody, Rhythm has been regarded as the hardest to address, as it is utterly linked to Pitch and Intensity. Nevertheless, Rhythm is a very good indicator of a speaker's fluency in a foreign language or even of some diseases. Canonical ways to measure Rhythm, such as $ΔC$ or $\%V$, involve a cumbersome process of segment alignment, often leading to modest and questionable results. Perceptively, however, rhythm does not sound as difficult, as humans can grasp it even when the text is not fully intelligible. In this work, we develop an empirical and unsupervised method of rhythm assessment, which does not rely on the content. We have created a fixed-length representation of each utterance, Peak Embedding (PE), which codifies the proportional distance between peaks of the chosen Low-Level Descriptors. Clustering pairs of small sentence-like units, we have attained averages of 0.444 for Silhouette Coefficient using PE with Loudness, and 0.979 for Global Separability Index with a combination of PE with Pitch and Loudness. Clustering same-structure words, we have attained averages of 0.196 for Silhouette Coefficient and 0.864 for Global Separability Index for PE with Loudness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源