论文标题
高维度曲线的随机预测
Random projections for curves in high dimensions
论文作者
论文摘要
现代时间序列分析需要能够处理固有高维的数据集;示例包括在气候学中的应用,必须考虑许多传感器的测量,或者对大型商店的库存跟踪,其中尺寸由跟踪项目的数量定义。减轻数据的高维度引起的计算问题的标准方法是应用一些缩小技术来保留环境空间的结构属性。两个时间序列之间的差异通常通过``离散''距离的概念,例如动态的时间扭曲或离散的fréchet距离。由于所有这些距离函数都是直接在时间序列的点上计算的,因此它们对不同的采样率或间隙敏感。连续的Fréchet距离提供了一种流行的替代方案,旨在通过考虑通过在序列中的任何两个连续点之间线性插值获得的多边形曲线上的所有点来减轻这一点。 我们研究了随机投影的能力 - 约翰逊和Lindenstrauss通过有效降低维度来保持多边形曲线的连续fréchet距离。特别是,我们表明,一个人可以将尺寸降低到$ o(ε^{ - 2} \ log n)$,其中$ n $是输入点的总数,同时在任何两个确定的多边形曲线之间保留连续的fréchet距离内的连续fréchet距离内的$ 1 \ pm pmε$。我们以聚类的申请结束。
Modern time series analysis requires the ability to handle datasets that are inherently high-dimensional; examples include applications in climatology, where measurements from numerous sensors must be taken into account, or inventory tracking of large shops, where the dimension is defined by the number of tracked items. The standard way to mitigate computational issues arising from the high dimensionality of the data is by applying some dimension reduction technique that preserves the structural properties of the ambient space. The dissimilarity between two time series is often measured by ``discrete'' notions of distance, e.g. the dynamic time warping or the discrete Fréchet distance. Since all these distance functions are computed directly on the points of a time series, they are sensitive to different sampling rates or gaps. The continuous Fréchet distance offers a popular alternative which aims to alleviate this by taking into account all points on the polygonal curve obtained by linearly interpolating between any two consecutive points in a sequence. We study the ability of random projections à la Johnson and Lindenstrauss to preserve the continuous Fréchet distance of polygonal curves by effectively reducing the dimension. In particular, we show that one can reduce the dimension to $O(ε^{-2} \log N)$, where $N$ is the total number of input points while preserving the continuous Fréchet distance between any two determined polygonal curves within a factor of $1\pm ε$. We conclude with applications on clustering.