论文标题
随机平滑的高阶认证
Higher-Order Certification for Randomized Smoothing
论文作者
论文摘要
随机平滑是最近提出的针对对抗性攻击的防御,该攻击已经实现了SOTA可证明的鲁棒性,以$ \ ell_2 $扰动。许多出版物已通过使用不同的平滑度量将保证金扩展到其他指标,例如$ \ ell_1 $或$ \ ell_ \ infty $。尽管当前的框架已显示出接近最佳的$ \ ell_p $ radii,但与最佳相比,通过当前框架认证的总安全区域可能很小。在这项工作中,我们提出了一个框架,以改善这些平滑分类器的认证安全区域,而无需更改基本的平滑计划。理论贡献如下:1)我们通过将认证的半径计算重新计算为一类功能的嵌套优化问题来概括为随机平滑的认证。 2)我们提供了一种使用$ 0^{th} $ - 订单和$ 1^{st} $计算认证安全区域的方法 - 高斯平滑分类器的订单信息。我们还提供了一个框架,该框架可以概括使用高阶信息进行认证的计算。 3)我们为一阶信息的相关统计数据设计有效的高信心估计器。结合理论贡献2)和3)使我们能够认证比当前方法所提供的安全区域大得多。在CIFAR10和Imagenet数据集上,通过我们的方法认证的新区域可在一般$ \ ell_1 $认证的RADII以及$ \ ell_2 $认证的颜色空间攻击的Radii($ \ ell_2 $限制为1频道)上,同时在一般的ELL_2 $ \ ell_2 $ certifiend radii $ \ ell_2 $ certifiend radii中实现了较小的改进。我们的框架还可以提供一种方法来规避当前无法使用数据依赖的平滑技术的当前无法实现认证半径的不可能结果。
Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved SOTA provable robustness against $\ell_2$ perturbations. A number of publications have extended the guarantees to other metrics, such as $\ell_1$ or $\ell_\infty$, by using different smoothing measures. Although the current framework has been shown to yield near-optimal $\ell_p$ radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions. 2) We provide a method to calculate the certified safety region using $0^{th}$-order and $1^{st}$-order information for Gaussian-smoothed classifiers. We also provide a framework that generalizes the calculation for certification using higher-order information. 3) We design efficient, high-confidence estimators for the relevant statistics of the first-order information. Combining the theoretical contribution 2) and 3) allows us to certify safety region that are significantly larger than the ones provided by the current methods. On CIFAR10 and Imagenet datasets, the new regions certified by our approach achieve significant improvements on general $\ell_1$ certified radii and on the $\ell_2$ certified radii for color-space attacks ($\ell_2$ restricted to 1 channel) while also achieving smaller improvements on the general $\ell_2$ certified radii. Our framework can also provide a way to circumvent the current impossibility results on achieving higher magnitude of certified radii without requiring the use of data-dependent smoothing techniques.