使用聚类和概率摘要对大型多元散射数据的视觉分析

论文标题

使用聚类和概率摘要对大型多元散射数据的视觉分析

Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries

论文作者

Rapp, Tobias, Peters, Christoph, Dachsbacher, Carsten

论文摘要

迅速增长的科学模拟数据大小对交互式可视化和分析技术构成了重大挑战。在这项工作中，我们提出了一个紧凑的概率表示，以互动地可视化大型分散数据集。与使用概率分布代表体积数据块的先前方法相反，我们对任意结构化多元数据的群集进行建模。详细说明，我们讨论了如何有效地表示和存储每个集群的高维分布。我们观察到，一次考虑一次使用两个或三个数据维度的低维边分布以采用常见的视觉分析技术是足够的。基于此观察结果，我们通过低维高斯混合模型的组合表示高维分布。我们讨论了共同的交互式视觉分析技术在此表示中的应用。特别是，我们研究了几种基于频率的视图，例如1D和2D中的密度图，基于密度的并行坐标以及时间直方图。我们可视化表示形式引入的不确定性，讨论一种详细的机制，并明确可视化异常值。此外，我们提出了通过脱落各向异性3D高斯人的空间可视化，为此我们得出了封闭形式的解决方案。最后，我们描述了刷牙和链接到此聚类表示的应用。我们对几个大型现实世界数据集的评估证明了我们方法的缩放。

Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题