增强不变的多种多样学习

论文标题

增强不变的多种多样学习

Augmentation Invariant Manifold Learning

论文作者

Wang, Shulei

论文摘要

数据增强是一种广泛使用的技术，也是自我监督表示学习的最新进步的基本要素。通过保留增强数据之间的相似性，所得的数据表示可以改善各种下游分析并在许多应用中实现最先进的性能。尽管具有经验有效性，但大多数现有方法在一般的非线性环境下都缺乏理论理解。为了填补这一空白，我们在低维产品歧管上开发了一个统计框架，以建模数据增强转换。在此框架下，我们介绍了一种新的表示学习方法，称为增强流形学习和设计一种计算有效算法，通过将其重新定义为随机优化问题。与现有的自我监督方法相比，新方法同时利用了歧管的几何结构和增强数据的不变特性，并具有明确的理论保证。我们的理论研究表征了数据增强在提出的方法中的作用，并揭示了为什么从增强数据中学到的数据表示可以改善下游分析中$ k $ neart的邻居分类器，这表明更复杂的数据增强导致下游分析的进步。最后，提出了对模拟和实际数据集的数值实验，以证明该方法的优点。

Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve various downstream analyses and achieve state-of-the-art performance in many applications. Despite the empirical effectiveness, most existing methods lack theoretical understanding under a general nonlinear setting. To fill this gap, we develop a statistical framework on a low-dimension product manifold to model the data augmentation transformation. Under this framework, we introduce a new representation learning method called augmentation invariant manifold learning and design a computationally efficient algorithm by reformulating it as a stochastic optimization problem. Compared with existing self-supervised methods, the new method simultaneously exploits the manifold's geometric structure and invariant property of augmented data and has an explicit theoretical guarantee. Our theoretical investigation characterizes the role of data augmentation in the proposed method and reveals why and how the data representation learned from augmented data can improve the $k$-nearest neighbor classifier in the downstream analysis, showing that a more complex data augmentation leads to more improvement in downstream analysis. Finally, numerical experiments on simulated and real data sets are presented to demonstrate the merit of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题