在线增强学习算法的奖励设计支持口头自我护理

论文标题

在线增强学习算法的奖励设计支持口头自我护理

Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care

论文作者

Trella, Anna L., Zhang, Kelly W., Nahum-Shani, Inbal, Shetty, Vivek, Doshi-Velez, Finale, Murphy, Susan A.

论文摘要

牙齿疾病是最常见的慢性疾病之一，尽管可以预防。但是，有关最佳口腔卫生实践的专业建议通常被患者遗忘或遗弃。因此，患者可能会受益于及时和个性化的鼓励从事口腔自我保健行为。在本文中，我们开发了一种在线增强学习（RL）算法，用于优化基于移动的提示以鼓励口腔卫生行为的交付。开发这种算法的主要挑战之一是确保算法考虑了当前动作对未来动作有效性（即延迟效果）的影响，尤其是当算法变得简单时，以稳定而自动运行，以在约束的，真实的，现实的，现实世界中的设置（即高度加距性的）中进行自动运行。我们通过设计质量奖励来应对这一挑战，该质量奖励最大化所需的健康结果（即高质量的刷牙），同时最大程度地减少用户负担。我们还强调了一个程序，可以通过构建模拟环境测试床并使用测试床评估候选人来优化奖励的超参数。本文讨论的RL算法将部署在Oralytics，这是一种口头自我护理应用程序，提供行为策略，以促进患者参与口腔卫生实践。

Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of the current action on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been made simple in order to run stably and autonomously in a constrained, real-world setting (i.e., highly noisy, sparse data). We address this challenge by designing a quality reward which maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics, an oral self-care app that provides behavioral strategies to boost patient engagement in oral hygiene practices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题