论文标题

利用深度学习来识别Twitter数据上的药物使用

Utilizing Deep Learning to Identify Drug Use on Twitter Data

论文作者

Tassone, Joseph, Yan, Peizhi, Simpson, Mackenzie, Mendhe, Chetan, Mago, Vijay, Choudhury, Salimur

论文摘要

社交媒体的收集和检查已成为研究用户的心理活动和行为趋势的有用机制。通过分析收集的Twitter数据,开发了用于对药物相关的推文进行分类的模型。使用有关关键字的主题,例如lang语和药物消费方法,生成了一组推文。然后对潜在的候选者进行预处理,从而导致数据集为3,696,150行。比较了多种方法的分类功率,包括基于支持向量机(SVM),XGBOOST和基于卷积神经网络(CNN)的分类器。实施了深度学习方法来筛选和分析推文的语义含义,而不是简单的功能或属性分析。与其他方法相比,两个基于CNN的分类器的结果最佳。第一个接受了2,661个手动标记的样品培训,而另一个包括合成生成的推文,最终在12,142个样品中达到了最终。精度得分为76.35%和82.31%,AUC为0.90和0.91。此外,关联规则挖掘表明,常见的药物与经常使用的非法物质具有一定的对应关系,证明了系统的实际实用性。最后,合成生成的集合提供了提高的分数,提高了分类能力并证明了这种方法的价值。

The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. Through the analysis of collected Twitter data, models were developed for classifying drug-related tweets. Using topic pertaining keywords, such as slang and methods of drug consumption, a set of tweets was generated. Potential candidates were then preprocessed resulting in a dataset of 3,696,150 rows. The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers. Rather than simple feature or attribute analysis, a deep learning approach was implemented to screen and analyze the tweets' semantic meaning. The two CNN-based classifiers presented the best result when compared against other methodologies. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Additionally, association rule mining showed that commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of the system. Lastly, the synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源