Modeling Public Sentiment in Twitter

Keywords: Natural Language Processing, Machine Learning, Affective Computing, Commonsense Computing

Understanding processes underlying human affective and communicative phenomena can also help us build better algorithms for predicting such phenomena. During my bachelor’s thesis with Prof. Erik Cambria, I developed an algorithm for Twitter sentiment analysis that exploited our understanding of how humans pick up on linguistic cues and patterns to infer an affective phenomena called sentiment. Unlike humans, most ML methods do not treat sentences with different linguistic patterns differently and do not leverage on commonsense knowledge. To overcome these limitations, I built a hybrid classifier that detected sentiment from Tweets by modifying low-confidence predictions of a supervised classifier (SVM) using an unsupervised rule-based classifier. The features of the supervised classifier were modified according to the linguistic patterns occurring in the text, while the unsupervised classifier applied rules based on common-sense concepts extracted from text. The final system gave an F¬-score 4 units higher than the baseline.

This project led to two publications which have over 60 citations together:

  • Chikersal, P., Poria, S., Cambria, E., Gelbukh, A., & Siong, C. E. (2015, April). Modelling Public Sentiment in Twitter: Using Linguistic Patterns to Enhance Supervised Learning. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 49-65). Springer International Publishing. [link to paper] [link to presentation slides]
  • Chikersal, P., Poria, S., & Cambria, E. (2015). SeNTU: Sentiment Analysis of Tweets by Combining a Rule-based Classifier with Supervised Learning. In Proceedings of the 4th International Workshop on Semantic Evaluations (pp. 647-651). Association for Computational Linguistics. [link to paper]

My bachelor thesis combines these two papers:

  • Chikersal, P. (2015, May). Modelling Public Sentiment in Twitter. Bachelor Thesis, Nanyang Technological University, Singapore. [link to thesis]

Here’s the demo of the final algorithm:

Contributions of this project include an algorithm to extract multiword concepts from Tweets, using which we can query commonsense based sentiment resources like SenticNet. This project is also one of the first to leverage Sentic Computing for Sentiment Analysis of Social Media texts. For more information about Sentic Computing, please see: Briefly,

Sentic computing is a multi-disciplinary approach to natural language processing and understanding at the crossroads between affective computing, information extraction, and commonsense reasoning, which exploits both computer and human sciences to better interpret and process social information on the Web. In sentic computing, whose term derives from the Latin ‘sentire’ (root of words such as sentiment and sentience) and ‘sensus’ (as in commonsense), the analysis of natural language is based on linguistics and commonsense reasoning tools, which enable the analysis of text not only at document-, page- or paragraph-level, but also at sentence-, clause-, and concept-level.