论文标题

终身学习多语言数据分类的自然语言处理方法

Lifelong Learning Natural Language Processing Approach for Multilingual Data Classification

论文作者

Kozal, Jędrzej, Leś, Michał, Zyblewski, Paweł, Ksieniewicz, Paweł, Woźniak, Michał

论文摘要

数字媒体中的大量信息是当今世界上有关群众时事知识的主要来源,这使得比以往任何时候都可以以更大的规模传播虚假信息。因此,有必要开发出能够适应改变事实环境并以前或同时获得知识的新颖的假新闻检测方法。为了解决这个问题,我们提出了一种终生学习启发的方法,该方法允许以多种语言进行虚假新闻检测,并在每种语言中获得的知识相互转移。使用了经典的特征提取器,例如术语频率倒数文档频率或潜在的dirichlet分配,以及集成的深NLP(自然语言处理)BERT(来自变形金刚的双向编码器)模型与MLP(多层perceptron)分类器配对。统计分析支持的两个专门针对伪造新闻分类任务(分别为英语和西班牙语)的数据集进行的实验结果证实,其他语言的利用可以提高传统方法的性能。同样,在某些情况下,用经典的方法补充深度学习方法可能会对获得的结果产生积极影响。还观察到了模型在分析语言之间概括知识的能力。

The abundance of information in digital media, which in today's world is the main source of knowledge about current events for the masses, makes it possible to spread disinformation on a larger scale than ever before. Consequently, there is a need to develop novel fake news detection approaches capable of adapting to changing factual contexts and generalizing previously or concurrently acquired knowledge. To deal with this problem, we propose a lifelong learning-inspired approach, which allows for fake news detection in multiple languages and the mutual transfer of knowledge acquired in each of them. Both classical feature extractors, such as Term frequency-inverse document frequency or Latent Dirichlet Allocation, and integrated deep NLP (Natural Language Processing) BERT (Bidirectional Encoder Representations from Transformers) models paired with MLP (Multilayer Perceptron) classifier, were employed. The results of experiments conducted on two datasets dedicated to the fake news classification task (in English and Spanish, respectively), supported by statistical analysis, confirmed that utilization of additional languages could improve performance for traditional methods. Also, in some cases supplementing the deep learning method with classical ones can positively impact obtained results. The ability of models to generalize the knowledge acquired between the analyzed languages was also observed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源