使用主观逻辑来估计多臂匪徒问题的不确定性

论文标题

使用主观逻辑来估计多臂匪徒问题的不确定性

Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems

论文作者

Zennaro, Fabio Massimo, Jøsang, Audun

论文摘要

多臂强盗问题是一个经典的决策问题，代理必须学习最佳的动作平衡探索和剥削。正确管理此权衡需要正确评估不确定性；与其他机器学习应用一样，在多臂匪徒中，重要的是要区分系统固有的随机性（不确定性）和随机性，而随机性源自对代理的有限知识（认知不确定性）。在本文中，我们将主观逻辑的形式主义（一种简洁而表达的框架表达出迪利奇 - 宗教模型作为主观观点），并将其应用于多武器匪徒的问题。我们提出了以主观逻辑为基础的新算法来解决多臂匪徒问题，我们将它们与文献中的经典算法进行了比较，并分析了它们在评估不确定性动态方面提供的见解。我们的初步结果表明，主观逻辑数量可以对更精致的药物可能利用的不确定性进行有用的评估。

The multi-armed bandit problem is a classical decision-making problem where an agent has to learn an optimal action balancing exploration and exploitation. Properly managing this trade-off requires a correct assessment of uncertainty; in multi-armed bandits, as in other machine learning applications, it is important to distinguish between stochasticity that is inherent to the system (aleatoric uncertainty) and stochasticity that derives from the limited knowledge of the agent (epistemic uncertainty). In this paper we consider the formalism of subjective logic, a concise and expressive framework to express Dirichlet-multinomial models as subjective opinions, and we apply it to the problem of multi-armed bandits. We propose new algorithms grounded in subjective logic to tackle the multi-armed bandit problem, we compare them against classical algorithms from the literature, and we analyze the insights they provide in evaluating the dynamics of uncertainty. Our preliminary results suggest that subjective logic quantities enable useful assessment of uncertainty that may be exploited by more refined agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题