论文标题

在多语言社交网络上表征用户内容

Characterising User Content on a Multi-lingual Social Network

论文作者

Agarwal, Pushkal, Garimella, Kiran, Joglekar, Sagar, Sastry, Nishanth, Tyson, Gareth

论文摘要

社交媒体一直是21世纪政治信息传播的先锋。大多数研究虚假信息,政治影响力和虚假新闻的研究都集中在主流社交媒体平台上。这不可避免地使英语成为我们当前对社交媒体政治活动的理解的重要因素。结果,对世界上很大一部分的研究只有有限的研究,包括最大的,多语言和多元文化的民主:印度。在本文中,我们介绍了印度一个名为Sharechat的多语言社交网络的特征。我们在2019年印度大选之前和期间以14种语言收集了一个详尽的数据集。我们通过将视觉上相似的图像聚集在一起,并探索它们如何在语言障碍中移动来研究交叉舌动力学。我们发现,泰卢固语,马拉雅拉姆语,泰米尔语和卡纳达语语言往往在征求政治形象(通常称为模因)中占主导地位,而印度语的帖子在整个Sharechat(以及包含英语文本的图像)中具有最大的交叉语言扩散。对于包含跨语言障碍的文本的图像,我们看到语言翻译用于扩大可访问性。也就是说,我们发现相同图像与截然不同的文本相关的案例(因此含义)。这种最初的表征为更先进的管道铺平了道路,以在多语言和非文本环境中理解假和政治内容的动态。

Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi-cultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源