Hi guys, it’s been a while. I have been busy with my studies. I will try my best to continue updating the blog regularly (i.e at least 2-3 posts a week) but please don’t hold me to it.
About 5-6 months back, I was struck with this idea of finding out whether my college administration sent more negative emails than positive ones to the students. It seemed to me that most of the mails were discouraging and reprimanding the students, rather than encouraging or appreciating them. I had no idea how to do it, but I wanted to do it nonetheless.
Then recently, I came to know that this was possible with something known as Sentiment Analysis. Sentiment analysis uses Natural Language Processing (NLP) to determine whether the text is positive, negative or neutral towards a particular topic or product. I am still learning the basics of Machine Learning and NLP, so I could not develop my own system for it. Instead, I used the Sentiment Analysis API of an existing company.
Basically, what I have done is:
- Search for the emails of a particular sender.
- Go through each one of those emails, and test the sentiment of the body text using the API.
- I then rounded off the sentiment value to either 0 (negative) or 1 (positive).
- Finally, I drew a pie chart with the total number of 1s and 0s.
As it turns out, most of the emails are positive rather than negative. I don’t know how reliable it is, but well, I’m satisfied with it.
Soon enough, another question hit my mind – “Is the length of the email related with its positivity?”. Meaning, when the college administration is writing an email with bad sentiment, do they end up writing long? I decided to see for myself.
So, what I did next, was:
- Use a regex to count the number of words in the email’s body text.
- And the same thing as above: tested the sentiment of the body text using the API as before.
- I then drew a scatter plot of number of words vs. positivity (in the scale of 0 to 1; 0 meaning completely negative and 1, completely positive.) I did not round them off to 1 or 0. I left them as they were.
And the result is: contrary to what I had mentioned, longer emails (250~300+) seem to be more positive than they are negative. However, it should be noted that they (i.e long emails) are too few to draw any conclusions upon. (Do you see something I failed to see? Please correct me if I am missing something.)
In case of another administration official however, there does not seem to be any relation between the length of email and the positivity since the plots are almost equally distributed. I did find something interesting though, compared to earlier official, this person has a habit of sending long emails (100 – 500 words), which is true.
It was a fun activity. My hypothesis’ were proved wrong by the computer as I found the answer to what was bothering me for so long.
How to use it for your own email sentiment analysis?
If you also want to check email sentiment of your boss/senior or some sender, then go ahead and make a copy of Email Sentiment Analysis with Google Script. Then, register on Indico. We are going to use their web service for Sentiment Analysis. You can sign-up using the free plan. The free plan offers about 100k requests a month which should be more than enough. Login to the dashboard and copy the key.
On the Spreadsheet, create a new sheet by clicking on the + sign at the bottom of the spreadsheet. Ignore the existing sheets; they were used by me. Create a new one for your use, and give it some name. Go to Tools > Script Editor. Under myFunction, assign the API key that you just copied to the variable api. Now, for as many senders whose sentiment you want to check, call the analyse function that many times.
For example, if I wanted to test sentiment of two senders firstname.lastname@example.org and email@example.com, I’d create two different calls to the analyse function where the first parameter is the name of the new spreadsheet created just for that sender, the second parameter is the sender’s email address, and the last one is the API (leave the last one as it is, do not change it). The first parameter should contain the name of the sheet to use for that particular sender. Remember to use different sheets for different senders. You can add new sheets by clicking the + symbol at the bottom of the spreadsheet.
Click on Run > myFunction. It will ask for authorization. Authorize it, and see as the spreadsheet is updated with the number of words, degree of positivity and the charts. Note that, if there are too many emails, you might need to set some timer so as not to exceed Google’s limit on making requests.
Let me know if you face any issues.