论文标题

肯德尔转换:信息理论的连续数据的强大表示

Kendall transformation: a robust representation of continuous data for information theory

论文作者

Kursa, Miron Bartosz

论文摘要

Kendall变换是将有序特征转换为单个值之间成对顺序关系的向量。这样,它保留了观察结果的排名,并以类别形式代表它。这种转换允许概括需要严格分类输入的方法,尤其是在离散化成为问题时少量观察结果的限制。特别是,许多信息理论的方法可以直接应用于Kendall转换的连续数据,而无需依赖差分熵或任何其他参数。此外,通过将信息过滤到排名中,肯德尔变换会导致更好的鲁棒性,以合理的成本下降复杂的交互作用,而这些相互作用是不可能正确估计的。在双变量分析中,肯德尔的变换可能与流行的非参数方法有关,显示了该方法的健全性。该论文还证明了其在多元问题中的效率,并提供了现实世界数据的示例分析。

Kendall transformation is a conversion of an ordered feature into a vector of pairwise order relations between individual values. This way, it preserves ranking of observations and represents it in a categorical form. Such transformation allows for generalisation of methods requiring strictly categorical input, especially in the limit of small number of observations, when discretisation becomes problematic. In particular, many approaches of information theory can be directly applied to Kendall-transformed continuous data without relying on differential entropy or any additional parameters. Moreover, by filtering information to this contained in ranking, Kendall transformation leads to a better robustness at a reasonable cost of dropping sophisticated interactions which are anyhow unlikely to be correctly estimated. In bivariate analysis, Kendall transformation can be related to popular non-parametric methods, showing the soundness of the approach. The paper also demonstrates its efficiency in multivariate problems, as well as provides an example analysis of a real-world data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源