希腊语中的进攻性语言身份

论文标题

希腊语中的进攻性语言身份

Offensive Language Identification in Greek

论文作者

Pitenis, Zeses, Zampieri, Marcos, Ranasinghe, Tharindu

论文摘要

由于进攻性语言已成为在线社区和社交媒体平台上的一个崛起问题，研究人员一直在调查应对滥用内容和开发系统以检测其不同类型的方法：网络欺凌，仇恨言论，侵略等。除了一些值得注意的例外，到目前为止，大多数对此主题的研究都涉及英语。这主要是由于英语语言资源的可用性。为了解决这个缺点，本文介绍了第一个希腊语注释的数据集，以进行进攻性语言标识：进攻性希腊推文数据集（OGTD）。 OGTD是一个手动注释的数据集，其中包含来自Twitter的4,779个帖子，被注释为令人反感而不是令人反感。除了对数据集的详细描述，我们还评估了对此数据进行培训和测试的几种计算模型。

As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With a few notable exceptions, most research on this topic so far has dealt with English. This is mostly due to the availability of language resources for English. To address this shortcoming, this paper presents the first Greek annotated dataset for offensive language identification: the Offensive Greek Tweet Dataset (OGTD). OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive. Along with a detailed description of the dataset, we evaluate several computational models trained and tested on this data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题