论文标题
Semeval-2022的UMass PCL任务4:用于检测光顾和屈服语言的预训练语言模型
UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language
论文作者
论文摘要
光顾和屈尊的语言(PCL)无处不在,但很少关注媒体对弱势社区的使用。准确地检测该形式的PCL是一项艰巨的任务,因为标记的数据有限以及它的微妙程度。在本文中,我们描述了我们检测到Semeval 2022任务4的语言的系统:光顾和屈服于语言检测。我们的方法使用预训练的语言模型,数据增强以及优化检测阈值的合奏。竞争主持人发布的评估数据集的实验结果表明,我们的工作能够可靠地检测PCL,在二进制分类任务上获得了55.47%的F1分数,而在细粒度,多标签检测任务上,二进制分类任务的F1得分为36.25%。
Patronizing and condescending language (PCL) is everywhere, but rarely is the focus on its use by media towards vulnerable communities. Accurately detecting PCL of this form is a difficult task due to limited labeled data and how subtle it can be. In this paper, we describe our system for detecting such language which was submitted to SemEval 2022 Task 4: Patronizing and Condescending Language Detection. Our approach uses an ensemble of pre-trained language models, data augmentation, and optimizing the threshold for detection. Experimental results on the evaluation dataset released by the competition hosts show that our work is reliably able to detect PCL, achieving an F1 score of 55.47% on the binary classification task and a macro F1 score of 36.25% on the fine-grained, multi-label detection task.