论文标题
通过随机原始偶发优化在大型MDP中有效的全球计划
Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization
论文作者
论文摘要
我们提出了一种新的随机原始偶偶有优化算法,用于在具有生成性模型和线性函数近似的大型折扣马尔可夫决策过程中进行计划。假设该特征映射大致满足标准的可实现性和钟表固定的条件,并且所有状态行动对的特征向量都可以表示为一小核核心国家行动对的凸组合,我们表明我们的方法在多一项质量质量的质量质量质量质量的质量上近乎最佳的策略。我们的方法在计算上是有效的,并且具有主要优点,即它输出了由低维参数向量压实表示的单个软磁性策略,并且不需要在运行时执行计算昂贵的本地计划子例程。
We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.