论文标题
汇总的成对语义差异,以示声索赔准确分类
Aggregating Pairwise Semantic Differences for Few-Shot Claim Veracity Classification
论文作者
论文摘要
作为自动化事实检查管道的一部分,声称准确分类任务在于确定是否由相关证据支持索赔。收集标记为索赔证据对的复杂性导致数据集的稀缺性,尤其是在处理新领域时。在本文中,我们介绍了种子,这是一种基于媒介的新方法,用于几乎没有声称真实性分类,从而汇总了索赔证据对的成对语义差异。我们基于以下假设:我们可以模拟类代表向量,这些向量捕获了一类索赔证明对的平均语义差异,然后可以将其用于新实例的分类。我们将方法的性能与竞争性基准进行了比较,包括微调的Bert/Roberta模型,以及利用语言模型困惑的最先进的少量真实分类方法。在发烧和经皮法数据集进行的实验表现出对竞争基准的几个环境的一致改进。我们的代码可用。
As part of an automated fact-checking pipeline, the claim veracity classification task consists in determining if a claim is supported by an associated piece of evidence. The complexity of gathering labelled claim-evidence pairs leads to a scarcity of datasets, particularly when dealing with new domains. In this paper, we introduce SEED, a novel vector-based method to few-shot claim veracity classification that aggregates pairwise semantic differences for claim-evidence pairs. We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class, which can then be used for classification of new instances. We compare the performance of our method with competitive baselines including fine-tuned BERT/RoBERTa models, as well as the state-of-the-art few-shot veracity classification method that leverages language model perplexity. Experiments conducted on the FEVER and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings. Our code is available.