论文标题
首先提出问题以增强终身语言学习
Ask Question First for Enhancing Lifelong Language Learning
论文作者
论文摘要
终身语言学习旨在流式传输学习NLP任务,同时保留对先前任务的知识。基于语言模型和以下无数据约束方法的先前工作已经探索了将所有数据格式化为“开始令牌(\ textit {b}) +上下文(\ textit {c}) +问题(\ textit {q}) +答案(\ textit {a})”。 However, they still suffer from catastrophic forgetting and are exacerbated when the previous task's pseudo data is insufficient for the following reasons: (1) The model has difficulty generating task-corresponding pseudo data, and (2) \textit{A} is prone to error when \textit{A} and \textit{C} are separated by \textit{Q} because the information of the \ textit {c}在生成\ textit {a}之前会减少。因此,我们首先提出问问题和重播问题(AQF-RQ),包括一种新颖的数据格式“ \ textit {bqca}”和一项新的培训任务,以培训先前任务的伪造问题。实验结果表明,AQF-RQ使模型更容易生成匹配相应任务的更多伪数据,并且在任务边界既明确又不清楚时,对相应的任务符合相应的任务,对伪DATA的足够和不足。与多任务学习相比,AQF-RQ仅能达到0.36 \%的性能。
Lifelong language learning aims to stream learning NLP tasks while retaining knowledge of previous tasks. Previous works based on the language model and following data-free constraint approaches have explored formatting all data as "begin token (\textit{B}) + context (\textit{C}) + question (\textit{Q}) + answer (\textit{A})" for different tasks. However, they still suffer from catastrophic forgetting and are exacerbated when the previous task's pseudo data is insufficient for the following reasons: (1) The model has difficulty generating task-corresponding pseudo data, and (2) \textit{A} is prone to error when \textit{A} and \textit{C} are separated by \textit{Q} because the information of the \textit{C} is diminished before generating \textit{A}. Therefore, we propose the Ask Question First and Replay Question (AQF-RQ), including a novel data format "\textit{BQCA}" and a new training task to train pseudo questions of previous tasks. Experimental results demonstrate that AQF-RQ makes it easier for the model to generate more pseudo data that match corresponding tasks, and is more robust to both sufficient and insufficient pseudo-data when the task boundary is both clear and unclear. AQF-RQ can achieve only 0.36\% lower performance than multi-task learning.