论文标题
用户行为检索点击率预测
User Behavior Retrieval for Click-Through Rate Prediction
论文作者
论文摘要
点击率(CTR)预测在现代在线个性化服务中起关键作用。实际上,有必要通过对顺序用户行为进行建模以构建准确的CTR预测模型来捕获用户的漂流兴趣。但是,随着用户在平台上积累越来越多的行为数据,顺序模型使用每个用户的整个行为历史记录变得不足。首先,直接喂养长行为序列将使在线推理时间和系统负载不可行。其次,在如此悠久的历史中,有很多噪音使顺序模型学习失败。当前的工业解决方案主要将序列截断,而只是将最近的行为馈送到预测模型,这导致一个问题,即诸如周期性或长期依赖性之类的顺序模式并未嵌入在最近的几种行为中,而是在遥远的历史中。为了解决这些问题,在本文中,我们从数据的角度考虑了这一点,而不仅仅是设计更复杂但复杂的模型,并建议对CTR预测(UBR4CTR)框架的用户行为检索。在UBR4CTR中,首先使用可学习的搜索方法从整个用户历史记录序列中首先检索到最相关和最合适的用户行为。然后将这些检索的行为馈送到一个深层模型中,以做出最终预测,而不是简单地使用最近的预测。将UBR4CTR部署到成本低的工业模型管道中是非常可行的。在三个现实世界大规模数据集上进行的实验证明了我们提出的框架和模型的优势和功效。
Click-through rate (CTR) prediction plays a key role in modern online personalization services. In practice, it is necessary to capture user's drifting interests by modeling sequential user behaviors to build an accurate CTR prediction model. However, as the users accumulate more and more behavioral data on the platforms, it becomes non-trivial for the sequential models to make use of the whole behavior history of each user. First, directly feeding the long behavior sequence will make online inference time and system load infeasible. Second, there is much noise in such long histories to fail the sequential model learning. The current industrial solutions mainly truncate the sequences and just feed recent behaviors to the prediction model, which leads to a problem that sequential patterns such as periodicity or long-term dependency are not embedded in the recent several behaviors but in far back history. To tackle these issues, in this paper we consider it from the data perspective instead of just designing more sophisticated yet complicated models and propose User Behavior Retrieval for CTR prediction (UBR4CTR) framework. In UBR4CTR, the most relevant and appropriate user behaviors will be firstly retrieved from the entire user history sequence using a learnable search method. These retrieved behaviors are then fed into a deep model to make the final prediction instead of simply using the most recent ones. It is highly feasible to deploy UBR4CTR into industrial model pipeline with low cost. Experiments on three real-world large-scale datasets demonstrate the superiority and efficacy of our proposed framework and models.