论文标题
ABM:一种基于组和Fused Lasso的自动监督功能工程方法,用于基于损失的模型
ABM: an automatic supervised feature engineering method for loss based models based on group and fused lasso
论文作者
论文摘要
解决分类或回归问题的一个至关重要的问题是在送入模型之前应用功能工程和可变选择。最受欢迎的特征工程方法的一种是通过一些切割点离散地将持续变量离散化,这被称为弯曲处理,良好的切割点对于良好的型号而忽略了良好的稳定范围,因为我们可以忽略一些稳定的范围,因为我们可以忽略一定的稳定范围。知识大多数切割点选择是通过研究人员域已知的域名或一些天真的方法进行的,例如相等的切割或等频切割。在本文中,我们提出了一种基于组的端到端监督切割点选择方法,以及Fused lasso以及自动变量选择效果。我们的方法\ textbf {abm}(abm}(自动binational机器)。首先,我们将每个可变范围切成细网格箱和火车模型,并与我们的团体和组融合的套索正规化在每个连续的垃圾箱上。这是一种整合特征工程,可变选择和模型培训的方法,同时又有一个鼓舞人心的事情是,该方法是灵活的,可以将其用于基于损失功能的模型,包括其他范围的Neural网络。也可以在几天内与社区见面。
A vital problem in solving classification or regression problem is to apply feature engineering and variable selection on data before fed into models.One of a most popular feature engineering method is to discretisize continous variable with some cutting points,which is refered to as bining processing.Good cutting points are important for improving model's ability, because wonderful bining may ignore some noisy variance in continous variable range and keep useful leveled information with good ordered encodings.However, to our best knowledge a majority of cutting point selection is done via researchers domain knownledge or some naive methods like equal-width cutting or equal-frequency cutting.In this paper we propose an end-to-end supervised cutting point selection method based on group and fused lasso along with the automatically variable selection effect.We name our method \textbf{ABM}(automatic bining machine). We firstly cut each variable range into fine grid bins and train model with our group and group fused lasso regularization on each successive bins.It is a method that integrates feature engineering,variable selection and model training simultanously.And one more inspiring thing is that the method is flexible such that it can be taken into a bunch of loss function based model including deep neural networks.We have also implemented the method in R and open the source code to other researchers.A Python version will also meet the community in days.