论文标题
SARS-COV-2病毒RNA序列分类和卷积神经网络方法的地理分析
SARS-CoV-2 virus RNA sequence classification and geographical analysis with convolutional neural networks approach
论文作者
论文摘要
2019年12月,Covid-19感染传播到全世界,仍然活跃,今天在世界上造成了25万多人死亡。对该主题的研究一直集中在分析病毒的遗传结构,发展疫苗,疾病进程及其来源。在这项研究中,将属于SARS-COV-2病毒的RNA序列转化为具有两个基本图像处理算法的基因基序,并与卷积神经网络(CNN)模型分类。 CNN模型平均在曲线下达到98%的面积(AUC)值,在分类为亚洲,欧洲,美国和大洋洲的RNA序列中实现了。所得的人工神经网络模型用于对土耳其分离的病毒变体的系统发育分析。将所达到的分类结果与GISAID数据库中的基因比对值进行了比较,在该数据库中,SARS-COV-2病毒记录在世界各地保存。我们的实验结果表明,现在通过CNN模型检测病毒的地理分布可能是一种有效的方法。
Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today. Researches on this subject have been focused on analyzing the genetic structure of the virus, developing vaccines, the course of the disease, and its source. In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms and classified with the convolutional neural network (CNN) models. The CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania. The resulting artificial neural network model was used for phylogenetic analysis of the variant of the virus isolated in Turkey. The classification results reached were compared with gene alignment values in the GISAID database, where SARS-CoV-2 virus records are kept all over the world. Our experimental results have revealed that now the detection of the geographic distribution of the virus with the CNN models might serve as an efficient method.