论文标题

树库嵌入向量以进行域外依赖性解析

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

论文作者

Wagner, Joachim, Barry, James, Foster, Jennifer

论文摘要

单语依赖性解析的最新进展是树库嵌入向量的概念,它允许将特定语言的所有Treebanks用作训练数据,同时允许该模型更喜欢一个Treebank而不是其他人的培训数据,并在测试时选择首选的Treebank。我们以1)为基础1)引入一种方法来预测不来自培训的树库的句子,以及2)探索当我们在测试时间内远离预定义的Treebank嵌入向量时会发生什么,而是设计了定制的插值。我们表明1)有一些插值向量优于预定义的向量,而2)可以用足够精确的速度来预测十种测试语言中的九种,以匹配Oracle方法的性能,该方法知道最合适的预定预先定义的Treebank嵌入了测试集。

A recent advance in monolingual dependency parsing is the idea of a treebank embedding vector, which allows all treebanks for a particular language to be used as training data while at the same time allowing the model to prefer training data from one treebank over others and to select the preferred treebank at test time. We build on this idea by 1) introducing a method to predict a treebank vector for sentences that do not come from a treebank used in training, and 2) exploring what happens when we move away from predefined treebank embedding vectors during test time and instead devise tailored interpolations. We show that 1) there are interpolated vectors that are superior to the predefined ones, and 2) treebank vectors can be predicted with sufficient accuracy, for nine out of ten test languages, to match the performance of an oracle approach that knows the most suitable predefined treebank embedding for the test set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源