通过因果建模改善机器学习识别的生物标志物的概括：对免疫受体诊断的研究

论文标题

通过因果建模改善机器学习识别的生物标志物的概括：对免疫受体诊断的研究

Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

论文作者

Pavlović, Milena, Hajj, Ghadi S. Al, Kanduri, Chakravarthi, Pensar, Johan, Wood, Mollie, Sollid, Ludvig M., Greiff, Victor, Sandve, Geir Kjetil

论文摘要

机器学习越来越多地用于发现高维分子数据的诊断和预后生物标志物。但是，与实验设计有关的多种因素可能会影响学习可推广和临床适用诊断的能力。在这里，我们认为，因果观点改善了这些挑战的识别，并正式地将它们与基于机器学习的诊断的稳健性和概括关系形式化。为了进行具体的讨论，我们专注于一个最近建立的高维生物标志物 - 适应性免疫受体曲目（AIRRS）。通过模拟，我们说明了AIRR结构域的主要生物学和实验因素如何影响学习的生物标志物。总之，我们认为因果建模通过识别变量之间的稳定关系并指导人群之间的关系和变量的调整来改善基于机器学习的生物标志物鲁棒性。

Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题