论文标题

自我词:多摩斯癌数据的自我监督学习框架

Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data

论文作者

Hashim, Sayed, Nandakumar, Karthik, Yaqub, Mohammad

论文摘要

得益于下一代测序,我们已经获得了大量多摩斯数据的访问。但是,由于其高维度和大部分没有注释,分析这些数据是一项挑战。缺乏带注释的数据是机器学习中的一个重要问题,并且通常使用自我监督学习(SSL)方法来处理有限的标记数据。但是,缺乏使用SSL方法来利用未标记的多摩学数据的词间关系的研究。在这项工作中,我们开发了一种新颖有效的预训练范式,该范式由各种SSL组件组成,包括但不限于对比度对准,从损坏的样本中恢复数据以及使用一种类型的OMICS数据来恢复其他OMIC类型。我们的训练范式通过有限的标记数据提高了下游任务的性能。我们表明,我们的方法在半监督环境中的癌症类型分类中的最新方法优于最新方法。此外,我们表明,使用我们的方法进行预训练的编码器即使无需微调即使是强大的功能提取器。我们的消融研究表明,该方法并不过分依赖任何借口任务组件。我们方法中的网络体系结构旨在处理缺失的OMIC类型和多个用于训练和下游培训的数据集。我们的预训练范式可以扩展以对罕见癌症进行零拍的分类。

We have gained access to vast amounts of multi-omics data thanks to Next Generation Sequencing. However, it is challenging to analyse this data due to its high dimensionality and much of it not being annotated. Lack of annotated data is a significant problem in machine learning, and Self-Supervised Learning (SSL) methods are typically used to deal with limited labelled data. However, there is a lack of studies that use SSL methods to exploit inter-omics relationships on unlabelled multi-omics data. In this work, we develop a novel and efficient pre-training paradigm that consists of various SSL components, including but not limited to contrastive alignment, data recovery from corrupted samples, and using one type of omics data to recover other omic types. Our pre-training paradigm improves performance on downstream tasks with limited labelled data. We show that our approach outperforms the state-of-the-art method in cancer type classification on the TCGA pan-cancer dataset in semi-supervised setting. Moreover, we show that the encoders that are pre-trained using our approach can be used as powerful feature extractors even without fine-tuning. Our ablation study shows that the method is not overly dependent on any pretext task component. The network architectures in our approach are designed to handle missing omic types and multiple datasets for pre-training and downstream training. Our pre-training paradigm can be extended to perform zero-shot classification of rare cancers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源