多元随机森林估计器的渐近正态性

论文标题

多元随机森林估计器的渐近正态性

Asymptotic Normality for Multivariate Random Forest Estimators

论文作者

Li, Kevin

论文摘要

回归树和随机森林在实际应用中是流行且有效的非参数估计器。 Athey和Wager最近的一篇论文表明，任何时候的随机森林估计是渐近的高斯。在本文中，我们将此结果扩展到多元案例，并表明在多个点上的估计值是共同正常的。具体而言，限制正态分布的协方差矩阵是对角线的，因此任何两个点的估计值在足够深的树中是独立的。此外，非对角线术语的数量捕获了两个点属于所得树的同一分区的可能性。我们的结果依赖于构造拆分时的某些稳定性属性，我们举例说明了该假设为且无法满足的拆分规则的示例。我们测试了我们提出的协方差结合以及数值模拟中相关的置信区间覆盖率。

Regression trees and random forests are popular and effective non-parametric estimators in practical applications. A recent paper by Athey and Wager shows that the random forest estimate at any point is asymptotically Gaussian; in this paper, we extend this result to the multivariate case and show that the vector of estimates at multiple points is jointly normal. Specifically, the covariance matrix of the limiting normal distribution is diagonal, so that the estimates at any two points are independent in sufficiently deep trees. Moreover, the off-diagonal term is bounded by quantities capturing how likely two points belong to the same partition of the resulting tree. Our results relies on certain a certain stability property when constructing splits, and we give examples of splitting rules for which this assumption is and is not satisfied. We test our proposed covariance bound and the associated coverage rates of confidence intervals in numerical simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题