论文标题

扩散状态距离:多临时分析,快速算法和应用于生物网络

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

论文作者

Cowen, Lenore, Devkota, Kapil, Hu, Xiaozhe, Murphy, James M., Wu, Kaiyi

论文摘要

与数据有关的指标是学习高维数据的基础结构的强大工具。本文开发并分析了称为扩散状态距离(DSD)的数据依赖性度量,该度量使用数据驱动的扩散过程比较点。与相关的扩散方法不同,DSD跨时间尺度合并信息,这允许以无参数方式推断出固有的数据结构。本文基于介质平衡在潜在的扩散过程中的多个临时出现,开发了DSD的理论。还提出和分析了用于使用DSD降低尺寸的新算法。这些方法基于基础扩散过程的加权光谱分解,以及对合成数据集和真实生物网络的实验,说明了速度和准确性方面所提出算法的功效。在整个过程中,进行了与相关方法的比较,以说明显示多尺度结构的数据集的DSD的独特优势。

Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源