通用线性和加性模型的快速稀疏分类

论文标题

通用线性和加性模型的快速稀疏分类

Fast Sparse Classification for Generalized Linear and Additive Models

论文作者

Liu, Jiachang, Zhong, Chudi, Seltzer, Margo, Rudin, Cynthia

论文摘要

我们提出了稀疏的广义线性和添加剂模型的快速分类技术。即使在存在许多高度相关的功能的情况下，这些技术也可以在几分钟之内处理数千个功能和数千个观测值。对于快速稀疏的逻辑回归，我们对其他最佳申请搜索技术的计算加速归功于线性和二次替代削减逻辑损失，这使我们能够有效地筛选筛选以消除的功能，并使用优先级的队列，从而有利于更均匀的功能探索功能。作为逻辑损失的替代方案，我们提出了指数损失，该指数损失允许在每次迭代时进行分析解决方案。我们的算法通常比以前的方法快2至5倍。它们产生了可解释的模型，这些模型具有与挑战性数据集上的黑匣子模型相当的准确性。

We present fast classification techniques for sparse generalized linear and additive models. These techniques can handle thousands of features and thousands of observations in minutes, even in the presence of many highly correlated features. For fast sparse logistic regression, our computational speed-up over other best-subset search techniques owes to linear and quadratic surrogate cuts for the logistic loss that allow us to efficiently screen features for elimination, as well as use of a priority queue that favors a more uniform exploration of features. As an alternative to the logistic loss, we propose the exponential loss, which permits an analytical solution to the line search at each iteration. Our algorithms are generally 2 to 5 times faster than previous approaches. They produce interpretable models that have accuracy comparable to black box models on challenging datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题