论文标题
tldr:象征性损失动态重新加权,以减少重复的话语产生
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation
论文作者
论文摘要
自然语言产生(NLG)模型容易产生重复性话语。在这项工作中,我们使用经常性神经网络(RNN)和变压器体系结构研究了编码模型的重复问题。为此,我们考虑了Chit-Chat任务,其中问题比需要编码器架构的其他任务更为突出。我们首先研究模型架构的影响。通过使用RNN的预注意式和高速公路连接,我们设法达到较低的重复率。但是,此方法不会推广到其他模型,例如变形金刚。我们假设,更深层次的原因是,在培训语料库中,生成模型的学习难度比其他模型更困难,一旦学习完成,硬令牌仍然不足,因此重复的一代更可能发生。基于这一假设,我们提出了代币损失动态重新加权(TLDR),将可区分权重应用于单个令牌损耗。通过将较高的权重用于硬令牌和较低的权重,以便于代币,NLG型号可以在不同的步伐上学习单个令牌。 CHIT-CHAT基准数据集的实验表明,与使用不同的加权功能相比,TLDR在RNN和Transformer架构的重复降低方面更有效。
Natural Language Generation (NLG) models are prone to generating repetitive utterances. In this work, we study the repetition problem for encoder-decoder models, using both recurrent neural network (RNN) and transformer architectures. To this end, we consider the chit-chat task, where the problem is more prominent than in other tasks that need encoder-decoder architectures. We first study the influence of model architectures. By using pre-attention and highway connections for RNNs, we manage to achieve lower repetition rates. However, this method does not generalize to other models such as transformers. We hypothesize that the deeper reason is that in the training corpora, there are hard tokens that are more difficult for a generative model to learn than others and, once learning has finished, hard tokens are still under-learned, so that repetitive generations are more likely to happen. Based on this hypothesis, we propose token loss dynamic reweighting (TLDR) that applies differentiable weights to individual token losses. By using higher weights for hard tokens and lower weights for easy tokens, NLG models are able to learn individual tokens at different paces. Experiments on chit-chat benchmark datasets show that TLDR is more effective in repetition reduction for both RNN and transformer architectures than baselines using different weighting functions.