论文标题
基于实例的不确定性估计梯度提高回归树
Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
论文作者
论文摘要
增强梯度回归树(GBRT)在解决表格回归问题方面非常流行,但没有提供不确定性的估计。我们提出了基于实例的不确定性估计梯度增强的回归树(IBUG),这是一种扩展任何GBRT点预测器以产生概率预测的简单方法。 IBUG使用$ k $ neart的训练实例计算预测围绕预测的非参数分布,其中距离用树 - 安装的内核测量。 IBUG的运行时间取决于合奏中每个叶子的训练示例数量,并且可以通过对树或训练实例进行采样来改善。从经验上讲,我们发现IBUG的性能与22个基准回归数据集的先前最先进的性能相似或更好。我们还发现,IBUG可以通过使用不同的基本GBRT模型来提高概率性能,并且比竞争方法更灵活地对预测的后验分布进行建模。我们还发现,以前的方法在某些数据集上的概率校准较差,可以使用对验证数据调整的标量因子来减轻这些数据集。源代码可在https://www.github.com/jjbrophy47/ibug上找到。
Gradient-boosted regression trees (GBRTs) are hugely popular for solving tabular regression problems, but provide no estimate of uncertainty. We propose Instance-Based Uncertainty estimation for Gradient-boosted regression trees (IBUG), a simple method for extending any GBRT point predictor to produce probabilistic predictions. IBUG computes a non-parametric distribution around a prediction using the $k$-nearest training instances, where distance is measured with a tree-ensemble kernel. The runtime of IBUG depends on the number of training examples at each leaf in the ensemble, and can be improved by sampling trees or training instances. Empirically, we find that IBUG achieves similar or better performance than the previous state-of-the-art across 22 benchmark regression datasets. We also find that IBUG can achieve improved probabilistic performance by using different base GBRT models, and can more flexibly model the posterior distribution of a prediction than competing methods. We also find that previous methods suffer from poor probabilistic calibration on some datasets, which can be mitigated using a scalar factor tuned on the validation data. Source code is available at https://www.github.com/jjbrophy47/ibug.