论文标题

通过识别部分关节分数子空间的多源数据集成分解

Integrative decomposition of multi-source data by identifying partially-joint score subspaces

论文作者

Choi, SeoWon Gabriel, Jung, Sungkyu

论文摘要

对多源数据集的分析,其中相同对象的数据是从多个来源收集的,在许多领域中的重要性上升,最重要的是在多摩变生物学中。提出了一种用于整合此类多源数据的集成分解的新型框架和算法,以识别和清除公共因子得分,以分数是否与所有数据源相关(完全关节),与某些数据源(部分关节)或与单个数据源相关。所提出的方法和现有方法之间的关键区别在于,原始源因素得分子空间用于识别部分障碍块的关联结构。从嘈杂的观测值中,识别可能与某些数据源部分关节的共同分数子空间,提出的算法顺序计算源在源评分子空间之间的一维标志平均值,然后收集接近平均值的子空间。所提出的分解具有快速的计算速度,并且比竞争方法比识别真正的部分关节关联结构并恢复关节负载和评分子空间方面优越。提出的分解应用于血液癌多词数据集,其中包含来自三个数据源的测量。我们的方法确定了潜在分数,部分与药物面板和甲基化数据源部分关节,但与RNA测序曲线无关,这有助于发现数据中的隐藏簇。

Analysis of multi-source dataset, where data on the same objects are collected from multiple sources, is of rising importance in many fields, most notably in multi-omics biology. A novel framework and algorithms for integrative decomposition of such multi-source data are proposed to identify and sort out common factor scores in terms of whether the scores are relevant to all data sources (fully joint), to some data sources (partially joint), or to a single data source. The key difference between the proposed method and existing approaches is that raw source-wise factor score subspaces are utilized in the identification of the partially-joint block-wise association structure. To identify common score subspaces, which may be partially joint to some of data sources, from noisy observations, the proposed algorithm sequentially computes one-dimensional flag means among source-wise score subspaces, then collects the subspaces that are close to the mean. The proposed decomposition boasts fast computational speed, and is superior in identifying the true partially-joint association structure and recovering the joint loading and score subspaces than competing approaches. The proposed decomposition is applied to a blood cancer multi-omics data set, containing measurements from three data sources. Our method identifies a latent score, partially joint to the drug panel and methylation profile data sources but not relevant to RNA sequencing profiles, which helps discovering hidden clusters in the data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源