论文标题
精确的病毒鉴定,并通过机器学习可解释的拉曼签名
Accurate Virus Identification with Interpretable Raman Signatures by Machine Learning
论文作者
论文摘要
快速识别新出现或循环病毒是管理对潜在暴发的公共卫生反应的重要第一步。一款可移植的病毒捕获装置与无标签的拉曼光谱法相结合,通过迅速获得病毒的拉曼特征,然后采用机器学习方法来识别该病毒,以基于其拉曼光谱识别该病毒,该方法是通过其指纹识别的。我们提出了一种用于分析人和禽病毒的拉曼光谱的机器学习方法。专门针对光谱数据设计的卷积神经网络(CNN)分类器可实现多种病毒类型或亚型识别任务的高精度。 In particular, it achieves 99% accuracy for classifying influenza virus type A vs. type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and non-enveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus, IBV) from other avian viruses.此外,使用全梯度算法对受过训练的CNN模型中神经网络反应的解释突出了对病毒识别最重要的拉曼光谱范围。通过将ML选择的显着拉曼范围与已知生物分子和化学官能团的特征范围相关联(例如,酰胺,氨基酸,羧酸),我们验证了我们的ML模型有效地识别出在不同的病毒式和其他签名符号中,可以识别Proteins,Lipid和其他重要官能团的拉曼签名,以识别这些签名的签名。
Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such a machine learning approach for analyzing Raman spectra of human and avian viruses. A Convolutional Neural Network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A vs. type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and non-enveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus, IBV) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups (for example, amide, amino acid, carboxylic acid), we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses.