通过流注释培训

论文标题

通过流注释培训

Training with Streaming Annotation

论文作者

Zhang, Tongtao, Ji, Heng, Chang, Shih-Fu, Freedman, Marjorie

论文摘要

在本文中，我们解决了一种实用的方案，其中训练数据以小规模批量发布，而早期阶段的注释的质量低于后者的质量。为了解决这种情况，我们利用预先训练的变压器网络来保存和整合最显着的文档信息，同时关注当前批次的注释（大概具有更高质量）。在实验中，使用事件提取作为案例研究，我们提出的框架可以比常规方法更好（改善范围为3.6％至14.9％的绝对F-评分增益），尤其是在早期注释中有更多噪声时；关于最佳常规方法，我们的方法省略了19.1％的时间。

In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts. To tackle the situation, we utilize a pre-trained transformer network to preserve and integrate the most salient document information from the earlier batches while focusing on the annotation (presumably with higher quality) from the current batch. Using event extraction as a case study, we demonstrate in the experiments that our proposed framework can perform better than conventional approaches (the improvement ranges from 3.6 to 14.9% absolute F-score gain), especially when there is more noise in the early annotation; and our approach spares 19.1% time with regard to the best conventional method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题