论文标题

多尺度正弦嵌入使高分辨率质谱数据能够学习

Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

论文作者

Voronov, Gennady, Lightheart, Rose, Davison, Joe, Krettler, Christoph A., Healey, David, Butler, Thomas

论文摘要

研究了生物样品中的小分子,以提供有关疾病状态,环境毒素,天然产品发现和许多其他应用的信息。小分子混合物组成的主要窗口是串联质谱法(MS2),它产生的数据具有高灵敏度和每百万分辨率的部分。我们采用MS2中质量数据的多尺度正弦嵌入,旨在应对MS2数据的完整分辨率学习的挑战。使用这些嵌入,我们为光谱库搜索提供了新的最新模型,这是MS2数据初始评估的标准任务。我们还从MS2数据中引入了一项新的任务,即MS2数据的化学性质预测,该预测在高通量MS2实验中具有自然应用,并表明可以在药物化学家优先的10个化学特性中实现平均$ r^2 $ 80 \%。我们使用降低降低技术和具有不同浮点分辨率的实验,以显示从MS2数据学习中多尺度正弦式嵌入的重要作用。

Small molecules in biological samples are studied to provide information about disease states, environmental toxins, natural product drug discovery, and many other applications. The primary window into the composition of small molecule mixtures is tandem mass spectrometry (MS2), which produces data that are of high sensitivity and part per million resolution. We adopt multi-scale sinusoidal embeddings of the mass data in MS2 designed to meet the challenge of learning from the full resolution of MS2 data. Using these embeddings, we provide a new state of the art model for spectral library search, the standard task for initial evaluation of MS2 data. We also introduce a new task, chemical property prediction from MS2 data, that has natural applications in high-throughput MS2 experiments and show that an average $R^2$ of 80\% for novel compounds can be achieved across 10 chemical properties prioritized by medicinal chemists. We use dimensionality reduction techniques and experiments with different floating point resolutions to show the essential role multi-scale sinusoidal embeddings play in learning from MS2 data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源