论文标题

政治广告数据集:波兰2020年总统选举的用例

Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections

论文作者

Augustyniak, Łukasz, Rajda, Krzysztof, Kajdanowicz, Tomasz, Bernaczyk, Michał

论文摘要

政治运动充满了候选人在社交媒体上发布的政治广告。政治广告构成了竞选活动的一种基本形式,受到各种社会要求。我们介绍了第一个公开开放的数据集,用于检测波兰语言的特定文本块和政治广告类别。它包含1,705条带有9个类别的人类通知的推文,这些推文构成了波兰选举法的竞选活动。我们达到了0.65通道协议(科恩的kappa得分)。另一个注释者解决了前两个注释者之间的不匹配,以提高注释过程的一致性和复杂性。我们使用新创建的数据集来训练一个良好的神经标记器(达到70%的F1分数)。我们还为此类数据集和模型提供了可能的用例,并在Twitter上对波兰2020总统选举进行了初步分析。

Political campaigns are full of political ads posted by candidates on social media. Political advertisements constitute a basic form of campaigning, subjected to various social requirements. We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language. It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law. We achieved a 0.65 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two annotators improving the consistency and complexity of the annotation process. We used the newly created dataset to train a well established neural tagger (achieving a 70% percent points F1 score). We also present a possible direction of use cases for such datasets and models with an initial analysis of the Polish 2020 Presidential Elections on Twitter.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源