论文标题

K-MHAS:韩国在线新闻评论中的多标签仇恨言语检测数据集

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

论文作者

Lee, Jean, Lim, Taejun, Lee, Heejun, Jo, Bogeun, Kim, Yangsok, Yoon, Heegeun, Han, Soyeon Caren

论文摘要

由于在线内容的增长,在线仇恨言论检测已成为一个重要的问题,但是英语以外的其他语言的资源极为有限。我们介绍了K-MHAS,这是一种新的多标签数据集,用于仇恨言语检测,可有效处理韩语模式。该数据集由新闻评论中的109k话语组成,并使用1到4个标签提供了多标签的分类,并处理主观性和相交性。我们使用具有六个不同指标的基于韩国 - 伯特语言模型对K-MHA的强大基线实验进行评估。 Kr-Bert带有子字符的代币器优于其他人,认识到每个仇恨言论课程中被分解的角色。

Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baseline experiments on K-MHaS using Korean-BERT-based language models with six different metrics. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源