论文标题
通过视觉方式了解高维空间,采用多维预测
Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections
论文作者
论文摘要
数据可视化有助于理解由多个变量表示的数据,也称为特征,该数据存储在一个大矩阵中,其中个体存储在线路中,并在列中存储可变值。这些数据结构通常称为多维空间。在本文中,我们说明了使用多维投影算法的视觉结果的方法来理解和调整其数学框架的参数。这些方法常见的一些常见数学是拉普拉斯矩阵,欧几里得距离,余弦距离和统计方法,例如kullback-leibler差异,用于拟合概率分布并减少尺寸。数据可视化字段中的两个相关算法是T分布的随机邻居嵌入(T-SNE)和最小二乘投影(LSP)。这些算法可用于了解数学功能的几个范围,包括它们对数据集的影响。在本文中,调整了LSP背后的T-SNE和网格重建方法之后的基本技术的数学参数,例如主成分分析(PCA),以反映数学配方所提供的属性。由LSP和T-SNE过程的说明性方法支持的结果旨在激发学生理解此类方法背后的数学,以便将其应用于多个应用程序中的有效数据分析任务中。
Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications.