论文标题

用变量自动编码器的SDSS光谱降低尺寸

Dimensionality Reduction of SDSS Spectra with Variational Autoencoders

论文作者

Portillo, Stephen K. N., Parejko, John K., Vergara, Jorge R., Connolly, Andrew J.

论文摘要

高分辨率星系光谱包含有关银河物理学的大量信息,但是这些光谱的高维度使得很难充分利用它们所包含的信息。我们将变量自动编码器(VAE)(VAE)(一种非线性维度降低技术)应用于Sloan Digital Sky Survey的光谱样本。与主成分分析(PCA)相反,一种广泛使用的技术可以捕获潜在参数与数据之间的非线性关系。我们发现,VAE只能使用六个潜在参数重建SDSS频谱,超过了具有相同数量组件的PCA。在这个潜在空间中自然会分离不同的星系类别,而没有给出VAE的类标签。 VAE潜在空间是可以解释的,因为VAE可用于在潜在空间的任何时刻制作合成光谱。例如,在潜在空间中沿着轨道进行合成光谱会产生逼真的光谱序列,这些光谱序列在两种不同类型的星系之间插值。使用潜在空间查找离群值可能会产生有趣的光谱:在我们的小样本中,我们立即发现不寻常的数据伪像,而星星被错误分类为星系。在这项探索性工作中,我们表明VAE创建了捕获数据的非线性特征的紧凑,可解释的潜在空间。虽然VAE需要大量时间训练(48000个光谱为〜1天),但一旦训练,VAE可以快速探索大型天文学数据集。

High resolution galaxy spectra contain much information about galactic physics, but the high dimensionality of these spectra makes it difficult to fully utilize the information they contain. We apply variational autoencoders (VAEs), a non-linear dimensionality reduction technique, to a sample of spectra from the Sloan Digital Sky Survey. In contrast to Principal Component Analysis (PCA), a widely used technique, VAEs can capture non-linear relationships between latent parameters and the data. We find that a VAE can reconstruct the SDSS spectra well with only six latent parameters, outperforming PCA with the same number of components. Different galaxy classes are naturally separated in this latent space, without class labels having been given to the VAE. The VAE latent space is interpretable because the VAE can be used to make synthetic spectra at any point in latent space. For example, making synthetic spectra along tracks in latent space yields sequences of realistic spectra that interpolate between two different types of galaxies. Using the latent space to find outliers may yield interesting spectra: in our small sample, we immediately find unusual data artifacts and stars misclassified as galaxies. In this exploratory work, we show that VAEs create compact, interpretable latent spaces that capture non-linear features of the data. While a VAE takes substantial time to train (~1 day for 48000 spectra), once trained, VAEs can enable the fast exploration of large astronomical data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源