拉福拉（Rafola）：用于检测强迫劳动指标的基本原理论

论文标题

拉福拉（Rafola）：用于检测强迫劳动指标的基本原理论

RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour

论文作者

Guzman, Erick Mendez, Schlegel, Viktor, Batista-Navarro, Riza

论文摘要

强迫劳动是最常见的现代奴隶制类型，它越来越引起研究和社会社区的关注。最近的研究表明，人工智能（AI）具有增强反奴隶制作用的巨大潜力。但是，需要与不同的利益相关者合作，需要透明地开发AI工具。这样的工具取决于对特定于域的数据的可用性和访问，由于强迫劳动的近乎不可思议的性质，这些工具稀缺。据我们所知，本文介绍了第一个公开访问的英语语料库，用于多级和多标签强制性劳动检测。该语料库由从专门数据来源检索的989条新闻文章组成，并根据国际劳工组织（ILO）定义的风险指标进行注释。每个新闻文章都有两个方面的注释：（1）强迫劳动作为分类标签的指标和（2）证明标签决策合理的文本片段。我们希望我们的数据集可以帮助促进对多级和多标签文本分类的解释性研究。在这项工作中，我们解释了收集基于拟议语料库的数据的过程，描述了我们的注释指南，并对其内容进行了一些统计分析。最后，我们根据变压器（BERT）模型的双向编码器表示的不同变体总结了基线实验的结果。

Forced labour is the most common type of modern slavery, and it is increasingly gaining the attention of the research and social community. Recent studies suggest that artificial intelligence (AI) holds immense potential for augmenting anti-slavery action. However, AI tools need to be developed transparently in cooperation with different stakeholders. Such tools are contingent on the availability and access to domain-specific data, which are scarce due to the near-invisible nature of forced labour. To the best of our knowledge, this paper presents the first openly accessible English corpus annotated for multi-class and multi-label forced labour detection. The corpus consists of 989 news articles retrieved from specialised data sources and annotated according to risk indicators defined by the International Labour Organization (ILO). Each news article was annotated for two aspects: (1) indicators of forced labour as classification labels and (2) snippets of the text that justify labelling decisions. We hope that our data set can help promote research on explainability for multi-class and multi-label text classification. In this work, we explain our process for collecting the data underpinning the proposed corpus, describe our annotation guidelines and present some statistical analysis of its content. Finally, we summarise the results of baseline experiments based on different variants of the Bidirectional Encoder Representation from Transformer (BERT) model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题