I’m back with another project to share. So a little background on how it all started. I’m taking a Data Mining course here at IU and my professor was talking about classification using Baye’s Theorem. From that, I ended up studying a little bit about Baye’s Naive Classifier and then I felt like learning Sentiment Analysis to get a kick out of it. Moreover, I felt like using Twitter to get my data because lately I’m taking interest in Network Science research here at IU and they do a lot of Twitter-related research.
I’ll start by defining what Sentiment Analysis is, the use of natural language processing, text analysis, and computational analysis to identify, extract, quantify, and study affective states and subjective information.
Coming back to the project I did use Tweepy(a python package to access the Twitter API). It starts with applying for a Twitter Developer Account and making sure you get to the Elevated plan(don’t worry it’s free) that comes with the v2 of the Twitter API, the Essential plan won’t be sufficient for this project.
Later on, you set up your API access by generating the API key and tokens and then you move on to gathering tweets from your favorite or a popular Twitter handle. For my project, I started with Barack Obama’s tweets. So you gather 100+ tweets and then you clean the data and calculate the subjectivity and polarity of the tweets using the Tweepy package.
Now you have the tweets data cleaned and calculated in a Pandas Dataframe and then you use the TextBlob and the WordCloud library to get a better understanding of the words used in the tweets dataset.
Here on, all you have to do is classify the tweets as Positive, Neutral, and Negative based upon if the calculated polarity of the tweet is greater than, equal to, or less than zero respectively. Once you’ve done that, you can plot your data however you want to get a better understanding of your result.
The way I did it was to calculate the percentage of positive tweets as well as the negative tweets and then I also plotted a scatterplot among the Subjectivity and Polarity of the tweets and then, later on, moved on to project the Analysis of the tweets onto a Bar Graph.
And here’s the final bar graph for the sentiment analysis of the last 100 tweets from @Barack Obama
Fairly positive tweets I should say :)
Hope you all like this. Please let me know if you need any clarity with the code I have posted or any other help related to this blog. Will write soon.