论文标题

识别样品流中的相关性

Identifying Correlation in Stream of Samples

论文作者

Gu, Zhenhao, Zhang, Hao

论文摘要

鉴于两个随机变量之间的独立性或相关性,鉴于它们的样本是统计中的一个基本问题。但是,如果状态数量较大,则如何以空间有效的方式做到这一点。 我们提出了一种新的,简单的计数器矩阵算法,该算法利用哈希函数和一个压缩计数器矩阵来提供$ \ ell_2 $独立公制的无偏估计。使用$ \ MATHCAL {O}(ε^{ - 4} \logΔ^{ - 1})$(非常松散的绑定)空间,我们可以保证$ 1 \pmε$乘法错误,概率至少$ 1-Δ$。我们还提供了算法与草图算法的最新草图的比较,并表明我们的算法有效,实际上更快,至少要高2倍。

Identifying independence between two random variables or correlated given their samples has been a fundamental problem in Statistics. However, how to do so in a space-efficient way if the number of states is large is not quite well-studied. We propose a new, simple counter matrix algorithm, which utilize hash functions and a compressed counter matrix to give an unbiased estimate of the $\ell_2$ independence metric. With $\mathcal{O}(ε^{-4}\logδ^{-1})$ (very loose bound) space, we can guarantee $1\pmε$ multiplicative error with probability at least $1-δ$. We also provide a comparison of our algorithm with the state-of-the-art sketching of sketches algorithm and show that our algorithm is effective, and actually faster and at least 2 times more space-efficient.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源