Twitter Sentiment Analysis with Spatial Visualisation and Analysis

I recently got an opportunity to intern at UST Software(India) Pvt. Ltd. It was a wonderful experience and I got to learn a lot of things. During my internship my team was assigned the project- Twitter Sentiment Analysis but a major part of our project was to analyse spatial data and also to visualise it.

In this blog I will go through the approaches we followed and also the challenges we faced while doing so. If you want to follow along with our project you can view the code on Github by clicking on the following link.

I have also shared my experience in a Youtube video and if you want to follow up on that you can click in this link.

Methodology

Looking back at our approaches we basically came across two different approaches that gave us similar outputs while using different technology stacks.

In our first approach, we proceeded as collecting twitter data using tweepy and used mysql for storing the data. Then used text blob and nltk for pre processing and sentiment analysis which gave us two properties — polarity(+ve/-ve) and subjectivity of an emotion. And at last used plotly for data visualisation.

The output of our first approach looked like the following images. We followed the approach of a really great article on medium and the github link for that code is here.

We went forward for finalising our project with the second approach since Vader Sentiment Analysis works better for with texts from social media and in general as well. Also MongoDB database works better with the kind of data we were dealing with.

In our second approach (which we later finalised), we collected and stored twitter data on MongoDB instance. Then we used VADER (Valence Aware Dictionary for Sentiment Reasoning) for text sentiment analysis that is sensitive to both strife (+ve/-ve) and power of an emotion of the text. VADER sentiment analysis depends on dictionary that draws lexical attributes of emotion intensities known as sentiment score. Then we get sentiment score for each location (countries) and last we will use dash and matplotlib for interactive data visualisation.

Second approach using: MongoDb, VADER, matplotlib and DASH

Data Collection

We initially thought of using the geocode parameter in the twitter search API. It we pass latitude and longitude, along with the radius and it returns tweets in that region. However, we found out that this result is not very optimal, because less than 0.85% of tweets are geotagged. When we downloaded streaming data of around 11700 tweets only around 35 tweets were geotagged.

Hence what we did was we recorded the location of the user instead of the location of the actual tweet. This resulted in a much larger number of eligible tweets however they still contained garbage values.

Now another issue was faced that the location that the users enter were not standardised. Some people wrote the countries name, some went for the name of their city. While there were some people who wrote the name of the galaxy.

To get around this problem we used a module called pycountry. uAfter making a list of all the country names — these include the 2 and 3 letter codes for each country as in the location box, many users input USA for United States of America or IN for India. We need to make sure all India, IN and IND go to the same country. Here is the logic for doing this.

if i in listofcountries:
if (len(i) == 2):
country = pycountry.countries.get(alpha_2=i.upper())
i = puncremover(country.name.lower())
elif(len(i) == 3):
country = pycountry.countries.get(alpha_3=i.upper())
i = puncremover(country.name.lower())
collection = db[i]
# print(i, status.text)
try:
collection.insert_one(status._json)
except Exception as e:
print(e)
pass

The function puncremover is a function which removes punctuation from the country name if there is any and standardizes it.

punc = '''!-[]{};:'"\,<>./?@#$%^&*_~'''def puncremover(test_str):
for ele in test_str:
if ele in punc:
test_str = test_str.replace(ele, "")
return test_str

Finally, after all the preprocessing we insert the tweets into our MongoDb database.

Sentiment Analysis and Visualisation

We simply iterate through all the collections and do the lexicon-based analysis using VADER. It runs very quickly even with all the collection and documents.

Then at last we use python dash modules to create dashboard which basically creates visual for a plotty plot which is compatible to go in dashboard. And hence we got our end result.

fig = px.choropleth(df, geojson=countries, locations='country code', color='score',
color_continuous_scale="sunset",
range_color=(0, 1),
scope="world",
labels={'score':'sentiment score'},
projection="orthographic",
hover_data=["number of tweets","country name"]
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Here is the sample of the output we got after executing the above code.

topics = Untitled.topicsapp = dash.Dash()app.layout = html.Div([
html.H2("Sentiment Analysis"),
html.Div(
[
dcc.Dropdown(
id="Topics",
options=[{
'label': i,
'value': i
} for i in topics],
value='All Topics'),
],
style={'width': '25%',
'display': 'inline-block'}),
dcc.Graph(id='funnel-graph'),
])
@app.callback(
dash.dependencies.Output('funnel-graph', 'figure'),
[dash.dependencies.Input('Topics', 'value')])
def update_graph(Topics):
if Topics in topics:
df_plot = Untitled.databases[Topics]
else:
df_plot = Untitled.databases['coronavirus_covid']
pv = pd.pivot_table(
df_plot,
index=['country'],
)
trace1 = go.Bar(x=pv.index, y=pv['positive'], name='Positive')
trace2 = go.Bar(x=pv.index, y=pv['neutral'], name='Neutral')
trace3 = go.Bar(x=pv.index, y=pv['negative'], name='Negative')
return {
'data': [trace1, trace2, trace3],
'layout':
go.Layout(
title='Sentiments for {}'.format(Topics),
barmode='stack')
}
if __name__ == '__main__':
app.run_server(debug=True)

We got the following output after successfully executing the above code.

Thank you for reading this article and hopefully you gained some positive insights from my experience. You can comment on my article for any feedback or any doubts you have relating to this article and i’ll make sure to reply to you.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Rise Of Explicit Lyrics In Songs

What Makes a Movie?

15 data tools to inspire and inform your COVID-19 solutions coverage

Halfway Through in #GenerasiGigih

Today on arXiv (#2)

READ/DOWNLOAD@) Understanding Statistical Process Control FULL BOOK PDF & FULL AUDIOBOOK

Classifying asteroids using ML: A beginner’s tale (Part 2)

Multi-Armed Bandits 101

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Devansh Sharma

Devansh Sharma

More from Medium

How I Did Sentiment Analysis Without Knowing About NLP

Feelings rule — Sentiment Analysis with VADER.

Learn how to Web-Scrape reviews from all pages of the desired product on the Amazon website and…

Semantic Analysis in NLP