论文标题
Vaiphy:用于系统发育的基于变异的推理算法
VaiPhy: a Variational Inference Based Algorithm for Phylogeny
论文作者
论文摘要
系统发育学是计算生物学的一种经典方法,如今已与单细胞数据(例如,在癌症发展的背景下)变得高度相关。不幸的是,树木空间的指数尺寸是使用基于马尔可夫链蒙特卡洛的方法的贝叶斯系统发育推断的实质性障碍,因为这些方法依赖于局部操作。尽管基于变异的推断(VI)方法提供了提高速度,但它们依靠昂贵的自动分化操作来学习变分参数。我们提出了Vaiphy,这是一种非常快速的基于VI的算法,用于在增强树空间中的后部推断。 Vaiphy与真实数据的最新方法产生边缘对数可能性估计值,并且由于不需要自动分化而更快。取而代之的是,Vaiphy将坐标上升更新方程与两个新型抽样方案结合在一起:(i)Slantis,这是增强树空间中树拓扑的提案分布,以及(ii)JC采样器,据我们所知,这是直接从流行的Junkes-Cantor模型中取样分支长度的第一个方案。我们根据密度估计和运行时比较Vaiphy。此外,我们评估了基准的可重复性。我们在github上提供代码:\ url {https://github.com/lagergren-lab/vaiphy}。
Phylogenetics is a classical methodology in computational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of cancer development. The exponential size of the tree space is, unfortunately, a substantial obstacle for Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more recent variational inference (VI) based methods offer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an augmented tree space. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equations with two novel sampling schemes: (i) SLANTIS, a proposal distribution for tree topologies in the augmented tree space, and (ii) the JC sampler, to the best of our knowledge, the first-ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density estimation and runtime. Additionally, we evaluate the reproducibility of the baselines. We provide our code on GitHub: \url{https://github.com/Lagergren-Lab/VaiPhy}.