论文标题
IIT Gandhinagar在Semeval-2020任务9:使用候选句子生成和选择的代码混合情感分类
IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment Classification Using Candidate Sentence Generation and Selection
论文作者
论文摘要
混合代码是在文本或语音相同的话语中使用多种语言的现象。这是一种在各种平台上的通信模式,例如社交媒体网站,在线游戏,产品评论等。单语文本的情感分析是一项精心研究的任务。通过非标准的写作方式,代码混合增加了分析文本情感的挑战。我们在基于BISTM的神经分类器之上提出了一种候选句子生成和基于选择的方法,以将Hinglish代码混合的文本分类为三种情感类别之一的阳性,负面或中立。与基于BISTM的神经分类器相比,提出的方法显示了系统性能的改善。结果提供了一个机会,可以理解文本数据中代码混合的其他各种细微差别,例如幽默检测,意图分类等。
Code-mixing is the phenomenon of using multiple languages in the same utterance of a text or speech. It is a frequently used pattern of communication on various platforms such as social media sites, online gaming, product reviews, etc. Sentiment analysis of the monolingual text is a well-studied task. Code-mixing adds to the challenge of analyzing the sentiment of the text due to the non-standard writing style. We present a candidate sentence generation and selection based approach on top of the Bi-LSTM based neural classifier to classify the Hinglish code-mixed text into one of the three sentiment classes positive, negative, or neutral. The proposed approach shows an improvement in the system performance as compared to the Bi-LSTM based neural classifier. The results present an opportunity to understand various other nuances of code-mixing in the textual data, such as humor-detection, intent classification, etc.