伪标签是您所需要的

论文标题

伪标签是您所需要的

Pseudo-Labels Are All You Need

论文作者

Kostić, Bogdan, Lucka, Mathis, Risch, Julian

论文摘要

自动估计读者文本的复杂性具有多种应用程序，例如向语言学习者推荐具有适当复杂性的文本或支持文本简化方法的评估。在本文中，我们介绍了2022年文本复杂性的提交，这是一项回归任务，目的是预测B级的德国学习者对德国学习者的复杂性。我们的方法依赖于由德国Wikipedia和其他Corpora创建的220,000多个伪标签，以培训基于Transformer的模型，并从任何功能Engineering Engineering Engineering Engineering或任何其他实验室数据中拒绝。我们发现，基于伪标签的方法给出了令人印象深刻的结果，但几乎不需要对特定任务进行调整，因此很容易适应其他领域和任务。

Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simplification approaches. In this paper, we present our submission to the Text Complexity DE Challenge 2022, a regression task where the goal is to predict the complexity of a German sentence for German learners at level B. Our approach relies on more than 220,000 pseudo-labels created from the German Wikipedia and other corpora to train Transformer-based models, and refrains from any feature engineering or any additional, labeled data. We find that the pseudo-label-based approach gives impressive results yet requires little to no adjustment to the specific task and therefore could be easily adapted to other domains and tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题