知识在语言模型中减轻隐私风险的知识学习

论文标题

知识在语言模型中减轻隐私风险的知识学习

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

论文作者

Jang, Joel, Yoon, Dongkeun, Yang, Sohee, Cha, Sungmin, Lee, Moontae, Logeswaran, Lajanugen, Seo, Minjoon

论文摘要

预审前的语言模型（LMS）在最初的预处理过程中记住了大量知识，包括可能侵犯个人生活和身份隐私的信息。以前解决语言模型隐私问题的工作主要集中在数据预处理和差异隐私方法上，既需要重新训练基础LM。我们建议知识学习作为减少LMS事后LMS隐私风险的替代方法。我们表明，仅在目标令牌序列上执行梯度上升，可以有效忘记它们，而对于较大的LMS几乎没有一般语言建模性能的降解；有时，它甚至可以大大改善基础LM，只有几次迭代。我们还发现，顺序学习比尝试一次学习所有数据要好，而该学位高度依赖于忘记了哪种数据（域）。通过显示与以前的数据预处理方法的比较以及已知的减轻LMS隐私风险的解码方法，我们表明，在易于提取攻击的数据中，学习可以提供更强大的经验隐私保证，同时更有效和强大。我们发布了所需的代码和数据集，以在https://github.com/joeljang/knowledge-unlearning上复制结果。

Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题