论文标题
将推文与跨语言的适用事实检查匹配
Matching Tweets With Applicable Fact-Checks Across Languages
论文作者
论文摘要
新闻事实检查的一个重要挑战是对现有事实核对的有效传播。反过来,这需要可靠的方法来检测先前事实检查的主张。在本文中,我们专注于自动寻找在社交媒体帖子中提出的索赔的现有事实检查(推文)。我们使用多语言变压器模型(例如XLM-Roberta和多语言嵌入者,例如Labse and Sbert)进行了单语(仅英语),多语言(西班牙语,葡萄牙语)和跨语性(印度英语)设置的分类和检索实验。我们为“匹配”分类(平均精度为86%)提供了令人鼓舞的结果。我们还发现,在单语实验中,BM25基线的表现胜过或与最先进的多语言嵌入模型相提并论。我们在以不同的语言解决此问题的同时,强调和讨论NLP的挑战,并介绍了一个新颖的事实检查数据集和相应的推文,以供将来的研究。
An important challenge for news fact-checking is the effective dissemination of existing fact-checks. This in turn brings the need for reliable methods to detect previously fact-checked claims. In this paper, we focus on automatically finding existing fact-checks for claims made in social media posts (tweets). We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as LaBSE and SBERT. We present promising results for "match" classification (86% average accuracy) in four language pairs. We also find that a BM25 baseline outperforms or is on par with state-of-the-art multilingual embedding models for the retrieval task during our monolingual experiments. We highlight and discuss NLP challenges while addressing this problem in different languages, and we introduce a novel curated dataset of fact-checks and corresponding tweets for future research.