种子不撒谎：计算机视觉模型的自适应水印框架

论文标题

种子不撒谎：计算机视觉模型的自适应水印框架

Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models

论文作者

Shams, Jacob, Nassi, Ben, Morikawa, Ikuya, Shimizu, Toshiya, Shabtai, Asaf, Elovici, Yuval

论文摘要

近年来，建议采用各种水印方法来检测从所有者非法获得的计算机视觉模型，但是他们无法证明对模型提取攻击的鲁棒性。在本文中，我们提出了一个自适应框架，可以在模型训练过程中初始化的独特随机种子，利用了模型的保护模型，利用了模型中存在的独特行为。该水印用于检测具有相同独特行为的提取模型，表明未经授权使用了受保护模型的知识产权（IP）。首先，我们展示了作为模型训练的一部分的随机数生成的初始种子如何在模型的决策边界中产生独特的特征，该界限是通过提取的模型继承并存在于其决策边界中的，但在与其他种子的同一数据集中训练的非提取模型中不存在。根据我们的发现，我们建议使用强大的自适应水印（RAW）框架，该框架利用受保护和提取模型中存在的独特行为来生成水印钥匙集和验证模型。我们表明，该框架对（1）看不见的模型提取攻击是强大的，并且（2）提取的模型经历了模糊的方法（例如，重量修剪）。我们评估了框架对幼稚攻击者的鲁棒性（不知道该模型是水印的），以及知情的攻击者（他们采用模糊的策略来从提取的模型中删除水印的行为），并实现出色的（即> 0.9）AUC值。最后，我们表明该框架对于与受保护模型的结构和/或体系结构不同的提取攻击模型是可靠的。

In recent years, various watermarking methods were suggested to detect computer vision models obtained illegitimately from their owners, however they fail to demonstrate satisfactory robustness against model extraction attacks. In this paper, we present an adaptive framework to watermark a protected model, leveraging the unique behavior present in the model due to a unique random seed initialized during the model training. This watermark is used to detect extracted models, which have the same unique behavior, indicating an unauthorized usage of the protected model's intellectual property (IP). First, we show how an initial seed for random number generation as part of model training produces distinct characteristics in the model's decision boundaries, which are inherited by extracted models and present in their decision boundaries, but aren't present in non-extracted models trained on the same data-set with a different seed. Based on our findings, we suggest the Robust Adaptive Watermarking (RAW) Framework, which utilizes the unique behavior present in the protected and extracted models to generate a watermark key-set and verification model. We show that the framework is robust to (1) unseen model extraction attacks, and (2) extracted models which undergo a blurring method (e.g., weight pruning). We evaluate the framework's robustness against a naive attacker (unaware that the model is watermarked), and an informed attacker (who employs blurring strategies to remove watermarked behavior from an extracted model), and achieve outstanding (i.e., >0.9) AUC values. Finally, we show that the framework is robust to model extraction attacks with different structure and/or architecture than the protected model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题