论文标题
Basqueparl:巴斯克议会抄录的双语语料库
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions
论文作者
论文摘要
议会成绩单提供了一种宝贵的资源来了解现实,并了解我们社会中随着时间的推移发生的最重要事实。此外,在这些成绩单中引发的政治辩论从计算社会科学的角度促进了政治话语的研究。在本文中,我们从巴斯克议会成绩单中发布了第一版的新版本。该语料库的特征是大型巴斯克 - 西班牙代码开关,代表了一种有趣的资源,用于研究与巴斯克和西班牙语等形成鲜明对比的语言中的政治话语。我们用与说话者和演讲者(语言,性别,政党...)的相关属性相关的元数据丰富了语料库,并处理文本以获取指定的实体和引理。然后,获得的元数据用于执行详细的语料库分析,该分析提供了有关跨时间,政党和性别的巴斯克政治代表使用语言的有趣见解。
Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies. Furthermore, the political debates captured in these transcripts facilitate research on political discourse from a computational social science perspective. In this paper we release the first version of a newly compiled corpus from Basque parliamentary transcripts. The corpus is characterized by heavy Basque-Spanish code-switching, and represents an interesting resource to study political discourse in contrasting languages such as Basque and Spanish. We enrich the corpus with metadata related to relevant attributes of the speakers and speeches (language, gender, party...) and process the text to obtain named entities and lemmas. The obtained metadata is then used to perform a detailed corpus analysis which provides interesting insights about the language use of the Basque political representatives across time, parties and gender.