论文标题
Torchopt:一个有效的可区分优化库
TorchOpt: An Efficient Library for Differentiable Optimization
论文作者
论文摘要
近年来见证了各种可区分优化算法的蓬勃发展。这些算法表现出不同的执行模式,其执行需要超越单个CPU和GPU的大量计算资源。但是,现有的可区分优化库不能支持有效的算法开发和多CPU/GPU执行,从而使可区分优化算法的开发通常很麻烦且昂贵。本文介绍了Torchopt,这是一个基于Pytorch的有效库,用于可区分优化。 Torchopt提供了一个统一的表达性优化编程抽象。该抽象使用户可以有效地声明和分析具有明确梯度,隐式梯度和零级梯度的各种可区分优化程序。 Torchopt进一步提供了高性能的分布式执行运行时。此运行时可以完全平行CPU / GPU上的计算密集型分化操作(例如张量树的平坦),并自动将计算分配给分布式设备。实验结果表明,Torchopt在8-GPU服务器上实现了$ 5.2 \ times $训练时间的速度。 Torchopt可在以下网址获得:https://github.com/metaopt/torchopt/。
Recent years have witnessed the booming of various differentiable optimization algorithms. These algorithms exhibit different execution patterns, and their execution needs massive computational resources that go beyond a single CPU and GPU. Existing differentiable optimization libraries, however, cannot support efficient algorithm development and multi-CPU/GPU execution, making the development of differentiable optimization algorithms often cumbersome and expensive. This paper introduces TorchOpt, a PyTorch-based efficient library for differentiable optimization. TorchOpt provides a unified and expressive differentiable optimization programming abstraction. This abstraction allows users to efficiently declare and analyze various differentiable optimization programs with explicit gradients, implicit gradients, and zero-order gradients. TorchOpt further provides a high-performance distributed execution runtime. This runtime can fully parallelize computation-intensive differentiation operations (e.g. tensor tree flattening) on CPUs / GPUs and automatically distribute computation to distributed devices. Experimental results show that TorchOpt achieves $5.2\times$ training time speedup on an 8-GPU server. TorchOpt is available at: https://github.com/metaopt/torchopt/.