在复杂的非结构化文本中预测主题：有关保护报告的案例研究

论文标题

在复杂的非结构化文本中预测主题：有关保护报告的案例研究

Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

论文作者

Edwards, Aleksandra, Rogers, David, Camacho-Collados, Jose, de Ribaupierre, Hélène, Preece, Alun

论文摘要

文本和句子分类的任务与大量标记培训数据的需求有关。大量标记的数据集的获取可能是昂贵的或不可行的，尤其是对于很难获得文档的高度专业域而言。研究基于少量培训数据的监督分类应用的研究是有限的。在本文中，我们讨论了最先进的深度学习和分类方法的结合，并提供了一些符合小型，特定领域和术语富裕语料库需求的方法的组合。我们专注于与一系列保障报告有关的现实情况，其中包括学习经验和反思，以解决涉及儿童和弱势成年人的严重事件。相对较少的可用报告及其使用高度域特异性术语使自动化方法的应用变得困难。我们专注于使用监督分类方法在保护报告中自动识别主要主题的问题。我们的结果表明，即使对于具有有限的标记数据的复杂任务，深度学习模型也有可能模拟主题 - 专家行为。

The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In this paper, we address the combination of state-of-the-art deep learning and classification methods and provide an insight into what combination of methods fit the needs of small, domain-specific, and terminologically-rich corpora. We focus on a real-world scenario related to a collection of safeguarding reports comprising learning experiences and reflections on tackling serious incidents involving children and vulnerable adults. The relatively small volume of available reports and their use of highly domain-specific terminology makes the application of automated approaches difficult. We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches. Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题