Twitter和Reddit上网络安全内容的情感分析

论文标题

Twitter和Reddit上网络安全内容的情感分析

Sentiment Analysis of Cybersecurity Content on Twitter and Reddit

论文作者

Thapa, Bipun

论文摘要

情绪分析提供了一个机会，可以理解该主题，尤其是在数字时代，由于大量的公共数据和有效的算法。网络安全是一个观点丰富且在公共领域不同的主题。这项描述性研究分析了Twitter和Reddit上的网络安全含量，以衡量其情感，正面或负面或中立。 Twitter和Reddit的数据通过选定的时间范围内通过技术特定的API积累，以创建数据集，然后由NLP（自然语言处理）算法Vader单独分析其情感。二十个人注释者还将网络安全含量的随机样本（十个推文和帖子）分类为情感，以评估Vader的性能。 Twitter上的网络安全含量至少为48％，Reddit至少为26.5％。正面或中立的内容远远超过了两个平台的负面情感。与被认为是真理的标准或来源的人类分类相比，Vader的Twitter精度为60％，在评估情绪时，Reddit的精度为70％。换句话说，算法和人类分类器之间的某种共识。总体而言，目标是探索一个关于网络安全情感的不受欢迎的研究主题

Sentiment Analysis provides an opportunity to understand the subject(s), especially in the digital age, due to an abundance of public data and effective algorithms. Cybersecurity is a subject where opinions are plentiful and differing in the public domain. This descriptive research analyzed cybersecurity content on Twitter and Reddit to measure its sentiment, positive or negative, or neutral. The data from Twitter and Reddit was amassed via technology-specific APIs during a selected timeframe to create datasets, which were then analyzed individually for their sentiment by VADER, an NLP (Natural Language Processing) algorithm. A random sample of cybersecurity content (ten tweets and posts) was also classified for sentiments by twenty human annotators to evaluate the performance of VADER. Cybersecurity content on Twitter was at least 48% positive, and Reddit was at least 26.5% positive. The positive or neutral content far outweighed negative sentiments across both platforms. When compared to human classification, which was considered the standard or source of truth, VADER produced 60% accuracy for Twitter and 70% for Reddit in assessing the sentiment; in other words, some agreement between algorithm and human classifiers. Overall, the goal was to explore an uninhibited research topic about cybersecurity sentiment

下载PDF全文

下载文献需遵守相关版权规定

论文标题