论文标题

极性编码:一种简单的基线方法,用于分类,缺少值

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

论文作者

Lenz, Oliver Urs, Peralta, Daniel, Cornelis, Chris

论文摘要

我们提出了极地编码,这是分类和数值$ [0,1] $的代表,具有在分类上下文中使用的值不足值的有价值属性。我们认为这是一种很好的基线方法,因为它可以与任何分类算法一起使用,保留缺失信息,非常简单地应用并提供良好的性能。特别是,与现有的缺失 - 指示方法不同,它不需要插补,确保丢失值与非损坏值相距,并让决策树算法选择如何拆分缺失值,从而实现对“属性中纳入属性中的失踪性”(MIA)建议的实际实现。此外,我们表明,可以将分类和$ [0,1] $ - 值的属性视为单个属性类型的特殊情况,与Barycentric坐标的经典概念相对应,并且这提供了对极性编码的自然解释,将极性编码视为一种模糊的单热编码形式。通过基于二十个现实生活数据的实验,我们表明,就最终的分类性能而言,极地编码的性能要比最先进的策略更好地“由链式方程式进行多重插补”(小鼠)和“多重插入deNoing deNoising Autocenting AutoCododence AutoCododers”(MIDAS)(MIDAS)和 - 依赖于含义的人,而不是均值/依靠/模式。

We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源