论文标题
用于数据逻辑分析的计算模型
A Computational Model for Logical Analysis of Data
论文作者
论文摘要
数据最初是由Peter Hammer引入的,对数据的逻辑分析是一种方法,旨在计算逻辑上的理由,以将一组数据划分为两组观测值,通常称为正组和负面组。将此分区视为对部分定义的布尔函数的描述;然后处理数据以识别属性的子集,其值可用于表征正组对负基组的观察结果。 LAD构成了经典统计学习技术的一种有趣的基于规则的学习替代方案,并且具有许多实际应用。然而,根据数据实例的属性,组表征的计算可能是昂贵的。我们工作的主要目的是通过计算某些\ emph {先验}的概率来提供有效的工具来加速计算,以表明给定的一组属性确实表征了正和负组。为此,我们根据我们对其上的信息提出了几种代表观测数据集的模型。这些模型及其允许我们计算的概率也有助于快速评估当前实际数据的某些属性;此外,它们可以帮助我们更好地分析和理解解决方法所遇到的计算困难。 一旦建立了模型,用于计算概率的数学工具就会来自分析组合。它们使我们能够将所需的概率表示为生成函数系数的比率,然后提供其数值的快速计算。本文的另一个远程目标是表明,分析组合学的方法可以帮助分析LAD和相关领域中各种算法的性能。
Initially introduced by Peter Hammer, Logical Analysis of Data is a methodology that aims at computing a logical justification for dividing a group of data in two groups of observations, usually called the positive and negative groups. Consider this partition into positive and negative groups as the description of a partially defined Boolean function; the data is then processed to identify a subset of attributes, whose values may be used to characterize the observations of the positive groups against those of the negative group. LAD constitutes an interesting rule-based learning alternative to classic statistical learning techniques and has many practical applications. Nevertheless, the computation of group characterization may be costly, depending on the properties of the data instances. A major aim of our work is to provide effective tools for speeding up the computations, by computing some \emph{a priori} probability that a given set of attributes does characterize the positive and negative groups. To this effect, we propose several models for representing the data set of observations, according to the information we have on it. These models, and the probabilities they allow us to compute, are also helpful for quickly assessing some properties of the real data at hand; furthermore they may help us to better analyze and understand the computational difficulties encountered by solving methods. Once our models have been established, the mathematical tools for computing probabilities come from Analytic Combinatorics. They allow us to express the desired probabilities as ratios of generating functions coefficients, which then provide a quick computation of their numerical values. A further, long-range goal of this paper is to show that the methods of Analytic Combinatorics can help in analyzing the performance of various algorithms in LAD and related fields.