论文标题
使用半参数混合模型的聚类数据具有不可降低性的丢失
Clustering Data with Nonignorable Missingness using Semi-Parametric Mixture Models
论文作者
论文摘要
我们关注的是群集的连续数据集,但遇到了不可忽视的丢失。在有条件独立性的假设下,我们用特定的半参数混合物进行聚类。混合模型是用于聚类的,而不是用于估计完整变量的密度(观察到和未观察到的),因此我们不需要对组件分布的其他假设来指定丢失机制。通过最大程度地扩展平滑的可能性允许丢失来进行估计。这种优化是通过大型最小化算法实现的。我们通过数值实验说明了我们方法的相关性。在温和的假设下,我们显示了定义观察到的数据分布和算法单调的模型的可识别性。我们还建议将这种新方法扩展到我们在真实数据集上说明的混合型数据的情况下。
We are concerned in clustering continuous data sets subject to non-ignorable missingness. We perform clustering with a specific semi-parametric mixture, under the assumption of conditional independence given the component. The mixture model isused for clustering and not for estimating the density of the full variables (observed and unobserved), thus we do not need other assumptions on the component distribution neither to specify the missingness mechanism. Estimation is performed by maximizing an extension of smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotony of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set.