论文标题
从Plaplanck到Cosmoglobe:开放科学,可重复性和数据寿命
From BeyondPlanck to Cosmoglobe: Open Science, Reproducibility, and Data Longevity
论文作者
论文摘要
BeyondPlanck和Cosmoglobe合作已实施了CMB实验的首个集成的贝叶斯端到端分析管道。这项工作的主要长期动机是开发一个共同的分析平台,该平台支持互补无线电,微波和亚毫米计实验的有效全球联合分析。成功的严格先决条件是CMB社区的广泛参与,因此该计划的两个基本方面是可重复性和开放科学。在本文中,我们讨论了我们针对这一目标的努力。我们还讨论了促进简易代码和数据分发,基于社区的代码文档,用户友好的编译程序等的措施。这项工作代表了首个公开发布的端到端CMB分析管道,其中包括原始数据,源代码,参数文件和文档。我们认为,这样的完整管道释放应该是所有主要未来和由公共资助的CMB实验的要求,并指出完整的公共发布可以通过确保可以在更好的处理技术,补充数据集或更多计算能力的情况下提高数据质量来大大提高数据寿命,或者还可以提供更多的计算能力,从而获得纳税人的价值,也可以获得金钱的价值;仅提供原始数据和最终产品不足以保证将来的完全可重复性。
The BeyondPlanck and Cosmoglobe collaborations have implemented the first integrated Bayesian end-to-end analysis pipeline for CMB experiments. The primary long-term motivation for this work is to develop a common analysis platform that supports efficient global joint analysis of complementary radio, microwave, and sub-millimeter experiments. A strict prerequisite for this to succeed is broad participation from the CMB community, and two foundational aspects of the program are therefore reproducibility and Open Science. In this paper, we discuss our efforts toward this aim. We also discuss measures toward facilitating easy code and data distribution, community-based code documentation, user-friendly compilation procedures, etc. This work represents the first publicly released end-to-end CMB analysis pipeline that includes raw data, source code, parameter files, and documentation. We argue that such a complete pipeline release should be a requirement for all major future and publicly-funded CMB experiments, noting that a full public release significantly increases data longevity by ensuring that the data quality can be improved whenever better processing techniques, complementary datasets, or more computing power become available, and thereby also taxpayers' value for money; providing only raw data and final products is not sufficient to guarantee full reproducibility in the future.