论文标题
预先训练的基于语言模型的句子匹配的主动学习
Pre-trained Language Model Based Active Learning for Sentence Matching
论文作者
论文摘要
主动学习能够显着降低数据驱动技术的注释成本。但是,自然语言处理的先前主动学习方法主要取决于基于熵的不确定性标准,而忽略了自然语言的特征。在本文中,我们提出了一种基于训练语言模型的句子匹配的主动学习方法。与以前的活跃学习不同,它可以提供语言标准来衡量实例并帮助选择更有效的注释实例。实验表明,通过更少的标记培训实例,我们的方法可以提高准确性。
Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria to measure instances and help select more efficient instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.