将推文与跨语言的适用事实检查匹配

论文标题

将推文与跨语言的适用事实检查匹配

Matching Tweets With Applicable Fact-Checks Across Languages

论文作者

Kazemi, Ashkan, Li, Zehua, Pérez-Rosas, Verónica, Hale, Scott A., Mihalcea, Rada

论文摘要

新闻事实检查的一个重要挑战是对现有事实核对的有效传播。反过来，这需要可靠的方法来检测先前事实检查的主张。在本文中，我们专注于自动寻找在社交媒体帖子中提出的索赔的现有事实检查（推文）。我们使用多语言变压器模型（例如XLM-Roberta和多语言嵌入者，例如Labse and Sbert）进行了单语（仅英语），多语言（西班牙语，葡萄牙语）和跨语性（印度英语）设置的分类和检索实验。我们为“匹配”分类（平均精度为86％）提供了令人鼓舞的结果。我们还发现，在单语实验中，BM25基线的表现胜过或与最先进的多语言嵌入模型相提并论。我们在以不同的语言解决此问题的同时，强调和讨论NLP的挑战，并介绍了一个新颖的事实检查数据集和相应的推文，以供将来的研究。

An important challenge for news fact-checking is the effective dissemination of existing fact-checks. This in turn brings the need for reliable methods to detect previously fact-checked claims. In this paper, we focus on automatically finding existing fact-checks for claims made in social media posts (tweets). We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as LaBSE and SBERT. We present promising results for "match" classification (86% average accuracy) in four language pairs. We also find that a BM25 baseline outperforms or is on par with state-of-the-art multilingual embedding models for the retrieval task during our monolingual experiments. We highlight and discuss NLP challenges while addressing this problem in different languages, and we introduce a novel curated dataset of fact-checks and corresponding tweets for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题