论文标题

带有警告评论的时间序列数据的主成分分析的渐近理论

Asymptotic Theory of Principal Component Analysis for Time Series Data with Cautionary Comments

论文作者

Zhang, Xinyu, Tong, Howell

论文摘要

主要成分分析(PCA)是数据科学几乎所有分支中最常用的统计工具。但是,像许多其他统计工具一样,有时会有滥用甚至滥用的风险。在本文中,我们根据数据为时间序列时的独立数据的假设来强调使用PCA的理论结果。对于后者,我们用证明特征值和特征向量的中心限制定理(负载),对其渐近协方差进行直接和自举估计,并通过模拟评估其疗效。具体而言,我们注意变异的比例,该变化决定了主要成分(PC)和负载的数量,这些数量有助于解释PC的含义。我们的发现是,虽然变异的比例对不同的依赖性假设非常可靠,但PC载荷的推断需要仔细注意。我们以关于投资组合管理的经验示例来启动并结束调查,其中PC负载起着重要的作用。它作为时间序列数据的正确使用PCA的范例给出。

Principal component analysis (PCA) is a most frequently used statistical tool in almost all branches of data science. However, like many other statistical tools, there is sometimes the risk of misuse or even abuse. In this paper, we highlight possible pitfalls in using the theoretical results of PCA based on the assumption of independent data when the data are time series. For the latter, we state with proof a central limit theorem of the eigenvalues and eigenvectors (loadings), give direct and bootstrap estimation of their asymptotic covariances, and assess their efficacy via simulation. Specifically, we pay attention to the proportion of variation, which decides the number of principal components (PCs), and the loadings, which help interpret the meaning of PCs. Our findings are that while the proportion of variation is quite robust to different dependence assumptions, the inference of PC loadings requires careful attention. We initiate and conclude our investigation with an empirical example on portfolio management, in which the PC loadings play a prominent role. It is given as a paradigm of correct usage of PCA for time series data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源