K-MHAS：韩国在线新闻评论中的多标签仇恨言语检测数据集

论文标题

K-MHAS：韩国在线新闻评论中的多标签仇恨言语检测数据集

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

论文作者

Lee, Jean, Lim, Taejun, Lee, Heejun, Jo, Bogeun, Kim, Yangsok, Yoon, Heegeun, Han, Soyeon Caren

论文摘要

由于在线内容的增长，在线仇恨言论检测已成为一个重要的问题，但是英语以外的其他语言的资源极为有限。我们介绍了K-MHAS，这是一种新的多标签数据集，用于仇恨言语检测，可有效处理韩语模式。该数据集由新闻评论中的109k话语组成，并使用1到4个标签提供了多标签的分类，并处理主观性和相交性。我们使用具有六个不同指标的基于韩国 - 伯特语言模型对K-MHA的强大基线实验进行评估。 Kr-Bert带有子字符的代币器优于其他人，认识到每个仇恨言论课程中被分解的角色。

Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baseline experiments on K-MHaS using Korean-BERT-based language models with six different metrics. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.

下载PDF全文

下载文献需遵守相关版权规定

论文标题