论文标题
用于基于模型的张量聚类的双重增强EM算法
A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering
论文作者
论文摘要
现代科学研究经常以张量的形式收集数据集,这些数据集需要创新的统计分析方法。特别是,迫切需要张量聚类方法来了解数据中的异质性。我们提出了一种张量正常混合模型(TNMM)方法,以实现概率解释和计算障碍。我们的统计模型利用张量协方差结构来减少简约建模的参数数量,同时明确利用了相关性,以获得更好的可变选择和聚类。我们提出了一种双重增强的期望最大化(DEEM)算法,以在此模型下执行聚类。 E-Step和M-Step均经过仔细的量量量量量量,以说明高维度的统计准确性和计算成本。理论研究证实,即使每种张量的尺寸以指数级的速率增长,DEEM也会达到一致的聚类。数值研究表明,与现有方法相比,DEEM的表现良好。
Modern scientific studies often collect data sets in the forms of tensors, which call for innovative statistical analysis methods. In particular, there is a pressing need for tensor clustering methods to understand the heterogeneity in the data. We propose a tensor normal mixture model (TNMM) approach to enable probabilistic interpretation and computational tractability. Our statistical model leverages the tensor covariance structure to reduce the number of parameters for parsimonious modeling, and at the same time explicitly exploits the correlations for better variable selection and clustering. We propose a doubly-enhanced expectation-maximization (DEEM) algorithm to perform clustering under this model. Both the E-step and the M-step are carefully tailored for tensor data in order to account for statistical accuracy and computational cost in high dimensions. Theoretical studies confirm that DEEM achieves consistent clustering even when the dimension of each mode of the tensors grows at an exponential rate of the sample size. Numerical studies demonstrate favorable performance of DEEM in comparison to existing methods.