C1在Semeval-2020任务9：Sentimix：使用功能工程的代码混合社交媒体文本的情感分析

论文标题

C1在Semeval-2020任务9：Sentimix：使用功能工程的代码混合社交媒体文本的情感分析

C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text using Feature Engineering

论文作者

Advani, Laksh, Lu, Clement, Maharjan, Suraj

论文摘要

在当今的相互联系和多语言世界中，社交媒体上的语言混合是常见的情况。尽管许多自然语言处理（NLP）诸如情感分析之类的任务是成熟的，并且针对单语文本进行了精心设计，但将这些任务应用于代码混合文本的技术仍然需要探索。本文介绍了我们在Semeval-2020任务9：Sentimix的代码混合社交媒体文本中的情感分析的功能工程方法。我们通过利用一套手工设计的词汇，情感和元数据来解决这个问题，以设计一个可以在“正”，“负”和“中性”情绪之间歧义的分类器。使用此模型，我们能够获得“ Hinglish”任务的加权F1分数为0.65，而“ Spanglish”任务的加权得分为0.63

In today's interconnected and multilingual world, code-mixing of languages on social media is a common occurrence. While many Natural Language Processing (NLP) tasks like sentiment analysis are mature and well designed for monolingual text, techniques to apply these tasks to code-mixed text still warrant exploration. This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix. We tackle this problem by leveraging a set of hand-engineered lexical, sentiment, and metadata features to design a classifier that can disambiguate between "positive", "negative" and "neutral" sentiment. With this model, we are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks

下载PDF全文

下载文献需遵守相关版权规定

论文标题