论文标题

打破社区:使用文本挖掘和图形机学习在Twitter上表征社区变化用户

Breaking the Communities: Characterizing community changing users using text mining and graph machine learning on Twitter

论文作者

Albanese, Federico, Lombardi, Leandro, Feuerstein, Esteban, Balenzuela, Pablo

论文摘要

即使互联网和社交媒体增加了人们可以消费的新闻和信息的数量,但大多数用户只会接触到加强其立场并将其与其他意识形态社区隔离的内容。这种环境对我们的生活产生了重大影响,例如严重的政治两极分化,虚假新闻,政治极端主义,仇恨团体以及缺乏丰富的辩论等。因此,鼓励不同的用户群体之间的对话并破坏封闭的社区对于健康的社会至关重要。在本文中,我们对使用自然语言处理技术和图形机器学习算法在Twitter上打破社区的用户进行了表征和研究。特别是,我们从150万用户收集了900万个Twitter消息,并构建了转发网络。我们确定了他们的社区和与他们相关的讨论主题。借助此数据,我们为社交媒体用户分类提供了一个机器学习框架,该框架检测到“社区破坏者”,即从封闭的社区旋转到另一个社区的用户。三个Twitter两极分化的政治数据集中的特征重要性分析表明,这些用户的Pagerank值较低,这表明更改是驱动的,因为他们的信息在社区中没有响应。这种方法还使我们能够确定他们感兴趣的特定主题,从而完全表征这种用户。

Even though the Internet and social media have increased the amount of news and information people can consume, most users are only exposed to content that reinforces their positions and isolates them from other ideological communities. This environment has real consequences with great impact on our lives like severe political polarization, easy spread of fake news, political extremism, hate groups and the lack of enriching debates, among others. Therefore, encouraging conversations between different groups of users and breaking the closed community is of importance for healthy societies. In this paper, we characterize and study users who break their community on Twitter using natural language processing techniques and graph machine learning algorithms. In particular, we collected 9 million Twitter messages from 1.5 million users and constructed the retweet networks. We identified their communities and topics of discussion associated to them. With this data, we present a machine learning framework for social media users classification which detects "community breakers", i.e. users that swing from their closed community to another one. A feature importance analysis in three Twitter polarized political datasets showed that these users have low values of PageRank, suggesting that changes are driven because their messages have no response in their communities. This methodology also allowed us to identify their specific topics of interest, providing a fully characterization of this kind of users.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源