论文标题
快速平行的贝叶斯网络结构学习
Fast Parallel Bayesian Network Structure Learning
论文作者
论文摘要
贝叶斯网络(BNS)是机器学习中广泛使用的图形模型,用于表示不确定性的知识。主流BN结构学习方法需要进行大量条件独立性(CI)测试。学习过程非常耗时,尤其是对于高维问题,这阻碍了BNS对更多应用的采用。现有的作品试图通过并行性加速学习过程,但面临着包括负载不平衡的问题,昂贵的原子操作和主要的平行开销。在本文中,我们提出了一个在多核CPU上的快速BN的快速解决方案,以提高BN结构学习的效率。快速BN由一系列效率优化提供动力,包括(i)设计动态的工作池以监视边缘的处理并更好地安排线程之间的工作量,(ii)用相同端点对边缘的CI测试进行分组,以减少不必要的CI测试的数量,以减少使用CACE-FRER-FROIND数据储存的效率,以改善并提高记忆效率,以提高记忆效率,并提高(III),并提高(III),并提高(以下是以前的记忆效率),并且(并提高)(以至于以至于(以前),并提高了(iii)的效率,并且(以便提高)(以至于以至于(以下是),并将其提高(iii)效率(以至于以至于(以下是),并提高了(iii)的效率,并(以下简介)避免额外的记忆消耗。一项全面的实验研究表明,快速BN的顺序版本的速度比其对应物快50倍,而FastBNS的并行版本在最先进的多线程解决方案上实现了4.8至24.5倍的速度。此外,快速BN对网络大小以及样本尺寸具有良好的可扩展性。 Fast-BNS源代码可在https://github.com/jjiantong/fastbn上免费获得。
Bayesian networks (BNs) are a widely used graphical model in machine learning for representing knowledge with uncertainty. The mainstream BN structure learning methods require performing a large number of conditional independence (CI) tests. The learning process is very time-consuming, especially for high-dimensional problems, which hinders the adoption of BNs to more applications. Existing works attempt to accelerate the learning process with parallelism, but face issues including load unbalancing, costly atomic operations and dominant parallel overhead. In this paper, we propose a fast solution named Fast-BNS on multi-core CPUs to enhance the efficiency of the BN structure learning. Fast-BNS is powered by a series of efficiency optimizations including (i) designing a dynamic work pool to monitor the processing of edges and to better schedule the workloads among threads, (ii) grouping the CI tests of the edges with the same endpoints to reduce the number of unnecessary CI tests, (iii) using a cache-friendly data storage to improve the memory efficiency, and (iv) generating the conditioning sets on-the-fly to avoid extra memory consumption. A comprehensive experimental study shows that the sequential version of Fast-BNS is up to 50 times faster than its counterpart, and the parallel version of Fast-BNS achieves 4.8 to 24.5 times speedup over the state-of-the-art multi-threaded solution. Moreover, Fast-BNS has a good scalability to the network size as well as sample size. Fast-BNS source code is freely available at https://github.com/jjiantong/FastBN.