论文标题
在染色体接触网络中绘制强大的多尺度社区
Mapping robust multiscale communities in chromosome contact networks
论文作者
论文摘要
为了更好地了解细胞核中DNA的3D折叠,研究人员开发了诸如HI-C之类的染色体捕获方法,以测量整个基因组所有DNA片段对之间的接触频率。由于HI-C数据集通常是巨大的,因此通常使用生物信息学方法将DNA段分组为具有相关接触模式的3D区域,例如拓扑相关的域(TADS)和A/B隔室。最近,另一个研究方向将HI-C数据视为3D触点网络。在此表示中,可以使用复杂网络理论的社区检测算法,这些算法将分组为紧密连接的中尺度社区。但是,由于HI-C网络非常密集,因此几个节点分区可能代表了社区检测问题的可行解决方案,但是除非包含其他数据,否则无法区分。由于此限制是网络的基本属性,因此无论社区找到或数据群集方法如何,此问题仍然存在。为了帮助解决这个问题,我们开发了一种方法,该方法绘制了来自人类细胞的HI-C数据中网络分区的解决方案格局。我们的方法使我们能够通过网络的尺度无缝扫描,并确定我们可以期望可靠的社区结构的制度。我们发现,某些尺度比其他尺度更强大,并且强大的簇可能有很大差异。我们的工作强调,找到强大的社区结构取决于周到的算法设计或方法交叉评估。
To better understand DNA's 3D folding in cell nuclei, researchers developed chromosome capture methods such as Hi-C that measure the contact frequencies between all DNA segment pairs across the genome. As Hi-C data sets often are massive, it is common to use bioinformatics methods to group DNA segments into 3D regions with correlated contact patterns, such as Topologically Associated Domains (TADs) and A/B compartments. Recently, another research direction emerged that treats the Hi-C data as a network of 3D contacts. In this representation, one can use community detection algorithms from complex network theory that group nodes into tightly connected mesoscale communities. However, because Hi-C networks are so densely connected, several node partitions may represent feasible solutions to the community detection problem but are indistinguishable unless including other data. Because this limitation is a fundamental property of the network, this problem persists regardless of the community-finding or data-clustering method. To help remedy this problem, we developed a method that charts the solution landscape of network partitions in Hi-C data from human cells. Our approach allows us to scan seamlessly through the scales of the network and determine regimes where we can expect reliable community structures. We find that some scales are more robust than others and that strong clusters may differ significantly. Our work highlights that finding a robust community structure hinges on thoughtful algorithm design or method cross-evaluation.