维度对计算决策树复杂性的影响

论文标题

维度对计算决策树复杂性的影响

The Influence of Dimensions on the Complexity of Computing Decision Trees

论文作者

Kobourov, Stephen G., Löffler, Maarten, Montecchiani, Fabrizio, Pilipczuk, Marcin, Rutter, Ignaz, Seidel, Raimund, Sorge, Manuel, Wulms, Jules

论文摘要

决策树递归地拆分功能空间$ \ mathbb {r}^{d} $，然后根据结果分区分配类标签。几十年来，决策树一直是基本机器学习工具包的一部分。大量工作会从训练数据中计算出启发式算法来计算决策树，通常旨在最大程度地减少所得树的大小。相比之下，对于给定训练数据计算最小尺寸树的基本计算问题的复杂性知之甚少。我们研究了功能空间尺寸的数字$ d $。我们证明它可以在$ O（n^{2d + 1} d）$时间中解决，但是在合理的复杂性理论假设下，不可能实现$ f（d）\ cdot n^{o（d / \ log d）$运行时间，其中$ n $是培训示例的数量。该问题可在$（dr）^{o（dr）} \ cdot n^{1+o（1）} $ time中解决，如果完全有两个类，而$ r $是标记为第一〜类的树叶的上限。

A decision tree recursively splits a feature space $\mathbb{R}^{d}$ and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work treats heuristic algorithms to compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number $d$ of dimensions of the feature space. We show that it can be solved in $O(n^{2d + 1}d)$ time, but under reasonable complexity-theoretic assumptions it is not possible to achieve $f(d) \cdot n^{o(d / \log d)}$ running time, where $n$ is the number of training examples. The problem is solvable in $(dR)^{O(dR)} \cdot n^{1+o(1)}$ time, if there are exactly two classes and $R$ is an upper bound on the number of tree leaves labeled with the first~class.

下载PDF全文

下载文献需遵守相关版权规定

论文标题