Timme：通过多任务多任务嵌入的Twitter意识形态检测

论文标题

Timme：通过多任务多任务嵌入的Twitter意识形态检测

TIMME: Twitter Ideology-detection via Multi-task Multi-relational Embedding

论文作者

Xiao, Zhiping, Song, Weiping, Xu, Haoyan, Ren, Zhicheng, Sun, Yizhou

论文摘要

我们旨在解决预测人们的意识形态或政治趋势的问题。我们通过使用Twitter数据来估算它，并将其正式化为分类问题。长期以来，意识形态检测一直是一个具有挑战性但重要的问题。某些团体，例如政策制定者，依靠它来做出明智的决定。早在过去，需要劳动密集型调查才能收集公众意见，分析普通公民的政治倾向是不安的。 Twitter等社交媒体的兴起使我们能够轻松地收集普通公民的数据。但是，标签的不完整性和社交网络数据集中的功能很棘手，更不用说数据大小和异质性了。数据与许多常用数据集截然不同，因此带来了独特的挑战。在我们的工作中，首先我们从Twitter构建了自己的数据集。接下来，我们提出了一种多任务多任务嵌入模型Timme，该模型有效地在稀疏标记的异质现实世界数据集上工作。它还可以处理输入功能的不完整。实验结果表明，Timme总体上比Twitter上意识形态检测的最新模型更好。我们的发现包括：链接可以导致没有文本的良好分类结果；在Twitter上，保守的声音代表不足；以下是预测意识形态的最重要关系。转发和提及增加了更高的喜欢的机会。最后但并非最不重要的一点是，Timme可以扩展到理论上的其他数据集和任务。

We aim at solving the problem of predicting people's ideology, or political tendency. We estimate it by using Twitter data, and formalize it as a classification problem. Ideology-detection has long been a challenging yet important problem. Certain groups, such as the policy makers, rely on it to make wise decisions. Back in the old days when labor-intensive survey-studies were needed to collect public opinions, analyzing ordinary citizens' political tendencies was uneasy. The rise of social medias, such as Twitter, has enabled us to gather ordinary citizen's data easily. However, the incompleteness of the labels and the features in social network datasets is tricky, not to mention the enormous data size and the heterogeneousity. The data differ dramatically from many commonly-used datasets, thus brings unique challenges. In our work, first we built our own datasets from Twitter. Next, we proposed TIMME, a multi-task multi-relational embedding model, that works efficiently on sparsely-labeled heterogeneous real-world dataset. It could also handle the incompleteness of the input features. Experimental results showed that TIMME is overall better than the state-of-the-art models for ideology detection on Twitter. Our findings include: links can lead to good classification outcomes without text; conservative voice is under-represented on Twitter; follow is the most important relation to predict ideology; retweet and mention enhance a higher chance of like, etc. Last but not least, TIMME could be extended to other datasets and tasks in theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题