论文标题
迪斯科舞厅:带有心理和情感标签的历时西班牙十四行诗语料库
DISCO PAL: Diachronic Spanish Sonnet Corpus with Psychological and Affective Labels
论文作者
论文摘要
如今,从不同语言的语料库中进行了许多文本挖掘的应用。但是,其中大多数基于散文中的文本,缺乏与诗歌文本一起使用的应用。诗歌中文本采矿应用的一个示例是,为了捕获词汇,sublexical和Intrexical含义,并推断文本的一般情感含义(GAM)。但是,即使已被证明对某些语言的诗歌有用,但对于西班牙诗歌和高度结构化的诗歌作品(例如十四行诗)都缺乏研究。本文对西班牙十四行诗的注释语料库介绍了一项研究,以分析是否可以从他们的单词中构建特征来预测他们的游戏。这样做的目的是在情感水平上建模十四行诗。本文还分析了十四行诗的游戏与内容本身之间的关系。为此,我们从心理学的角度考虑内容,当十四行诗与特定术语相关时,请标识标签。然后,我们研究GAM如何根据这些心理术语进行变化。 使用的语料库包含来自不同世纪的作者的274个西班牙十四行诗,从15日到19日。该语料库由不同的领域专家注释。专家以情感和词典的语义特征以及属于心理学的领域概念来注释诗歌。因此,十四行诗的语料库可用于不同的应用程序,例如诗歌推荐制度,作者的个性文本挖掘研究或用于治疗目的的诗歌。
Nowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, identifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from 15th to 19th. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, personality text mining studies of the authors, or the usage of poetry for therapeutic purposes.