论文标题
可扩展的混合HMM,具有连续时间序列数据群集的高斯过程发射
Scalable Hybrid HMM with Gaussian Process Emission for Sequential Time-series Data Clustering
论文作者
论文摘要
隐藏的马尔可夫模型(HMM)与高斯工艺(GP)发射结合使用,可有效地用来估计隐藏状态,以一系列复杂的输入 - 输出关系观察序列。尤其是当光谱混合物(SM)内核用于GP发射时,我们将此模型称为混合HMM-GPSM。该模型可以有效地对时间序列数据的顺序进行建模。但是,由于SM内核的大量参数,该模型无法有效地训练大量的数据,该数据具有(1)状态过渡的长序列和2)每个序列中大量的时间序列数据集。本文提出了针对HMM-GPSM的可扩展学习方法。为了有效地以长序列训练模型,该方法采用了随机变异推断(SVI)方法。同样,为了有效地处理每个时间序列数据的大量数据点,我们使用重新绘制的随机傅立叶功能(R-RFF)近似SM内核。这两种技术的组合大大减少了训练时间。我们使用具有缺失值的大规模合成和真实数据集来验证提出的学习方法的隐藏率估计精度和计算时间。
Hidden Markov Model (HMM) combined with Gaussian Process (GP) emission can be effectively used to estimate the hidden state with a sequence of complex input-output relational observations. Especially when the spectral mixture (SM) kernel is used for GP emission, we call this model as a hybrid HMM-GPSM. This model can effectively model the sequence of time-series data. However, because of a large number of parameters for the SM kernel, this model can not effectively be trained with a large volume of data having (1) long sequence for state transition and 2) a large number of time-series dataset in each sequence. This paper proposes a scalable learning method for HMM-GPSM. To effectively train the model with a long sequence, the proposed method employs a Stochastic Variational Inference (SVI) approach. Also, to effectively process a large number of data point each time-series data, we approximate the SM kernel using Reparametrized Random Fourier Feature (R-RFF). The combination of these two techniques significantly reduces the training time. We validate the proposed learning method in terms of its hidden-sate estimation accuracy and computation time using large-scale synthetic and real data sets with missing values.