HappyMeter : An Automated System for Real-Time Twitter Sentiment Analysis

The paper presents HappyMeter, an automated system for real-time Twitter sentiment analysis. More than 380 million tweets consisting of nearly 30,000 words, almost 6,000 hashtags and over 5,000 user mentioned have been studied. A sentiment model is used to measure the sentiment level of each term in the contiguous United States. The system automatically mines real-time Twitter data and reveals the changing patterns of the public sentiment over an extended period of time. It is possible to compare the public opinions regarding a subject, hashtag or a Twitter user between different states in the U.S. Users may choose to see the overall sentiment level of a term, as well as its sentiment value on a specific day. Real-time results are delivered continuously and visualized through a web-based graphical user interface. Keywords—Twitter; social networks; data mining; sentiment analysis


INTRODUCTION
Twitter has become an increasingly popular microblogging service that allows users to publish messages, a.k.a.tweets [1].It functions as a platform for people to express themselves, which often carries opinions on different subjects.Twitter usage is growing exponentially.There are 328 million monthly active users on Twitter and over 500 million tweets are created per day [2].
The rapid growth of Twitter and the public access of tweets have made Twitter a popular research subject.For example, researchers have examined the use of Twitter in promoting products and sharing consumer opinions [3].Enterprises have studied the usefulness of Twitter in organizational communication and information-gathering [4].Furthermore, tweets have been monitored to detect earthquakes [5].
In this paper, we present HappyMeter, a sentiment analysis tool that measures happiness on Twitter.Sentiment analysis is to computationally categorize opinions expressed in a given text.It is essentially important in social media monitoring as it provides an overview of the public sentiment regarding certain topics.
Unlike other online articles, Twitter messages share several unique features.Firstly, the vernacular on Twitter is informal [6].There could be misspelled words, slang and acronyms in a tweet due to Twitter's informal language style [7].Secondly, every tweet has a length constraint of maximum 140 characters [8].Moreover, Twitter covers an exceedingly broad range of topics [6].Lastly, due to the wide usage of mobile devices and the rapid flow of tweets, this user-generated data reflect instant reactions as events evolve.Therefore, we built our system intended for providing real-time insights of the public sentiment and showing changes over time.
Our paper presents an automated sentiment analyzer based on the Twitter traffic.The system streams all the tweets published in the contiguous United States in real time.For each tweet, a sentiment score is computed using a statistical sentiment model and the geographical data associated with the tweet are stored.We developed a web-based graphical user interface to deliver results instantly and continuously.
The rest of the paper is organized as follows.Section II reviews the related work.In Section III, we describe the data set used to build the system and introduce the methods and algorithms adapted in measuring the data.Section IV presents the results of the study and the visualization we have built.Section V concludes the paper and proposes future directions.

II. RELATED WORK
Applications of sentiment analysis are broad and powerful.As a subfield of Machine Learning and Natural Language Processing, research has been conducted ranging from document level classification [9] to determining the polarity (positive, negative or neutral) of sentences [10] and terms [11].In recent years, sentiment analysis on Twitter, specifically, has attracted increasing attention from many research communities.For instance, Bollen et al. investigated whether public mood on Twitter is correlated with shifts in the stock market [12].Vegas et al. modeled the 2016 U.S. presidential campaign in the context of Twitter [13].Zeitzoff used data on Twitter to measure social movements [14].
To determine the sentiment of a tweet, many past studies have focused on supervised learning where the training data are collected based on emoticons, hashtags or both [15], [16].Experiments show, however, that they contain biased information in sentiment analysis [7].Another common practice is to manually annotate the data in order to build a pool of training data.The apparent disadvantage of this method is the intensive labor and time involved in the process.
Our work is inspired by Dodds et al.'s study on temporal patterns of happiness on Twitter, in which they used a corpus mapped with happiness scores to examine the sentiment variations on different expressions over time [17].The expressions, however, consist of mainly words.Other crucial www.ijacsa.thesai.orgelements of a tweet, such as hashtags and user mentions, have not been much studied.Moreover, there was no geographical comparison; for example, exploring the public sentiment of a term between different states in the U.S. A later study by Mitchell et al. considered the geographical factor [18], but the results did not reveal the changing patterns over time, for example, observing the sentiment of a term in a particular state over an extended period of time.
In this paper, we present a system that shows the public sentiment of every term on Twitter, including unigrams, hashtags and user mentions.The system computes the public sentiment of every term in each contiguous U.S. state.The process is repeated daily.Results are visualized to reveal the sentiment alternation over time, as well as the comparisons between different states.

III. THE SYSTEM
The system performs a sentiment analysis on a Twitter tweet corpus collected since June 2016.In this section, we discuss the data set used in the study and how we define and calculate sentiment.

A. Data Set
The Twitter Streaming API [19] allows us to crawl realtime tweets and receive instant updates.Currently, we have gathered over 380 million tweets with geographical annotation enabled from the contiguous United States.This number keeps growing in the rate of 1.4 million tweets per day on average.Table 1 shows the basic statistics of the data set in the study.The highest volume of tweets was received on November 8, 2016, when the Unites States presidential election took place.The system collected more than 2 million tweets (2,292,345 to be precise) from the contiguous U.S. in a day.As mentioned in Section II, one of the major contributions of this work is the sentiment mining of some key components on Twitter, such as hashtags and user mentions.Thus far, the extracted tweets consist of a massive corpus of more than 29,000 unique unigrams, almost 6,000 distinctive hashtags and over 5,000 different user mentions.Table 2 provides an up-todate summary of the individual terms collected in the study.Our system performs a sentiment analysis on each of these terms in the context of Twitter.

B. Defining Sentiment
The sentiment of a term in a Twitter message is determined by an existing sentiment lexicon, the dictionary of Language Assessment by Mechanical Turk [17].The list contains over 10,000 most popular words with their average sentiment score, ranging from 1 to 9. In general, happy words have a high sentiment value with a score close to 9, while sad words are usually associated with a low score.Table 3 shows a sample of the lexicon.

C. Processing
The system collects real-time tweets through the streaming API and saves them on a server for further processing.As seen in Fig. 1, the process includes data manipulations, such as data cleaning and location identification, and sentiment computation.Results are thereafter stored in a database.The rest of this section elaborates the processing procedure.Twitter has the geotagging feature (Tweeting with Location) which allows users to publish a tweet with their location [20].This feature helps to make tweets more contextual.In the meantime, it provides valuable data for research.One thing to note is that users must give explicit permission for their exact location to be displayed with their tweets, due to Twitter's user privacy policy.Thus, not all the tweets collected come with geographical data.In this study, we keep only the geo-tagged Twitter messages.For each tweet, we store the state where the tweet was issued.
As our work targets tweets in English, tweets written in other languages are discarded.Non-English characters in a tweet are also erased.Due to the informal language model on Twitter (mentioned in Section I), misspelled words can often occur.The system cleans the data by removing words that do not exist in the sentiment lexicon.Hashtags and user mentions, however, are kept in the data set.
To compute the sentiment of a tweet, the system performs a simple average on the sentiment score of each word.Hashtags and user mentions are excluded in this process.Moreover, stop words with a neutral sentiment value falling between 4 and 6 are also excluded in the calculation, following Dodds et al. [17].www.ijacsa.thesai.orgLet us take the following tweet for an example, "@missnemmanuel is so gorgeous!#GoTS7e2 #GoT #newcrush".Among the terms, "is" has a sentiment score of 5.18, while "so" and "gorgeous" have a sentiment value being 5.08 and 7.42, respectively.Discarding hashtags, user mentions and neutral stop words, only word "gorgeous" is kept in calculating the sentiment of the tweet.Therefore, the average of 7.42, which is 7.42 itself, is assigned to the example tweet.
The system associates this computed sentiment value with every term in the tweet, including hashtags, user mentions and even stop words.Thus, in the previous example, each of the following terms receives a sentiment score of 7.42 along with the tweet: they are @missnemmanuel, is, so, gorgeous, #GoTS7e2, #GoT and #newcrush.
Each of the terms in the example is highly likely to appear in other tweets as well.The system collects the sentiment scores of a term in all occasions in a day and concludes a mean value.For instance, if @missnemmanuel is mentioned 10,000 times in one day, the system would gather the sentiment values from the 10,000 tweets and compute the average.In this work, we examine the daily sentiment of a term in the contiguous United States as a whole, as well as in each state.
One may question the need of computing sentiment of a neutral word.It may not seem necessary from the previously given tweet.But let us consider word "governor" as another example.According to the lexicon, it has a sentiment value of 5.14, which falls in the range of a neutral stop word.However, it would be interesting to see that some states share a higher sentiment value towards "governor" than others do.Moreover, it would be especially interesting to observe the changing pattern over time.
Another concern one may raise is the capacity and scalability of the system.After all, there are millions of potential user mentions and hashtags on Twitter.Plus, new ones are emerging in every second.Keeping a daily record of sentiment for all of them would require tremendous spatial resources and computing power.To tackle this issue, we set a threshold of 3 to be the minimum daily occurrences of a term.Hashtags or usernames mentioned less than three times in a day in the contiguous U.S. are discarded from the database.We believe that this method can help filtering only the active hashtags and popular user mentions.

IV. RESULTS
In this section, we demonstrate our system and present results from the analysis.We first studied the amount of daily tweets issued by each contiguous state.To justify the different populations in each state, we calculated the average number of tweets published per day per 10,000 capita in a state.Population estimates are retrieved from the United States Census Bureau [21].Fig. 2 shows the results after applying Jenks natural breaks optimization [22].Results ranging from 18 to 80 have been divided into five groups.Among the contiguous U.S. states, Louisiana delivers the most tweets per day per capita, while Wyoming has the least number of daily tweets per capita, as seen in Table 4.  Using the methodology introduced in Section III, we investigated the overall sentiment value of each U.S. state based on tweets collected from that region.Results range from 5.92 to 6.03 with a small standard deviation.The average sentiment value for the contiguous United States overall is 5.96.In our study, West Virginia and Wisconsin have the highest average sentiment value, while Alabama, Arkansas, Vermont and Virginia share the lowest sentiment score.Table 5 shows a summary of the statistics.Similar to Fig. 2, Jenks classification algorithm was utilized to visualize the variation of average sentiment values among U.S. states.Fig. 3 shows the results with five splits.This paper also examines the overall sentiment level of each word tweeted in the network.As one can see from the histogram in Fig. 4, most of the words (nearly 75%) have an average sentiment score between 6 and 7, which is considered positive.More than 21% of the words have an overall sentiment value falling between 5 and 6.The highest sentiment score of a word is 8.01 and the lowest sentiment value calculated is 3.73.Recall in Section III, a sentiment lexicon built with Language Assessment by Mechanical Turk (LabMT) was used to determine the initial sentiment of a standalone word.Each term was then processed by the HappyMeter system for the overall sentiment in the context of tweets.Table 6 shows the comparisons between the sentiment values before and after the processing of our system.As shown in the table, averagely, the overall sentiment has increased in the Twitter context.Moreover, the standard deviation has significantly dropped, meaning there are fewer extreme ratings.In general, contextual sentiment has become higher and milder.The two sentiment values have a strong Pearson's correlation of 0.73.Table 7 shows the top 10 words, hashtags and user mentions with the highest sentiment level.As we can see from the top words and hashtags, most people feel happy when they tweet about birthdays, flowers, beaches, karaoke and puppies.The best rated Twitter users include mostly movie and television stars, singers and comedians.Table 8, on the other hand, shows a list of the top 10 words, hashtags and user mentions with the lowest sentiment values.As shown in the table, besides the extreme words, such as murder, kill and dead, traffic is the number one problem in people's common life.User mentions that are associated with low sentiment values are mainly Twitter accounts belonging to the news media.
In this work, we also examined the frequencies of each term on the Twitter network.Table 9 gives a glance of the most often appeared words during the observation, along with their sentiment score.Note that neutral stop words (mentioned in Section III) have been excluded from the list.For example, "just" is the most used word appearing over 12 million times in our data set.However, it is not collected in the list because it can tell little about the public state of mind.We are happy to report that all of the top 10 popular words have a positive sentiment polarity with a sentiment value greater than 6.To further investigate the language model on the Twitter network, we built a histogram of usage frequencies of words in the data set, as seen in Fig. 5.Among the 380 million tweets we have received, 35% of the English words appeared between 10,000 and 100,000 times.Interestingly, 26% of the words occurred only less than 100 times.This again proved the casual language style and the rapid change of vocabulary on Twitter.We studied the most frequently occurred hashtags, shown in Table 10 along with their average sentiment value.Job related subjects, such as #job and #hiring, rank at the top of the list leaving the rest far behind.Hashtag #traffic ranks at number 15 with 426,309 mentions and #trump rank at number 18 with 377,559 appearances (not shown in Table 10).Hashtag #education appears later in the list, with 206,122 tags ranking at number 34.Geographical hashtags wise, New York attracts the most attention with 253,696 times mentioning #newyork and 228,345 references of #nyc.Followed after it are #houston with 369,117 occurrences and #chicago with 260,625 tags.The complete list of rankings is available upon request.Table 11 lists the top 10 most mentioned Twitter users and their overall sentiment score.As we can see, politicians dominate the list.The United States president Donald Trump (@realdonaldtrump) has been quoted more than 2.6 million times during our observation, which is nearly three times more than the second place, Hillary Clinton, his formal presidential campaign competitor.User @potus (President of the United States) ranks at number 4 and Kellyanne Conway (@kellyannepolls) holds the 9th place in the list.The rest of the list consists of mainly television news channels, such as Fox News (@foxnews), CNN (@cnn), New York Times (@nytimes) and MSNBC (@msnbc).Sean Hannity (@seanhannity), the radio and television host from Fox News, also has been frequently mentioned by the Twitter community, ranking number 10 in the list.The only Twitter account appearing in the top 10 list that is not politics-related is YouTube (@youtube), which holds the 6th place with over 250,000 mentions.The second popular non-political account is @nfl (National Football League), who received less than 100,000 quotes with a rank of 20.The analysis results of the system are visualized though a web-based graphical user interfaces available at www.happymeter.us.The dashboards display the Twitter sentiment map of a given term, sentiment rankings from the highest to the lowest among states in the contiguous U.S. and charts to reveal the temporal patterns.An example of the dashboards for Twitter user @nfl is shown in Fig. 6.
The sentiment map shows the average sentiment score of a selected term in each contiguous state.To better understand the geography of the public opinions regarding a subject, we applied Jenks natural breaks optimization to cluster the states into three classes.States with higher sentiment scores are classified as the (relatively) positive group.States with lower sentiment values are categorized as the (relatively) negative groups and the remaining states are part of the neutral class.On Word Frequency www.ijacsa.thesai.org the sentiment map, the green color is used to mark the positive group, while states belonging to the neutral and negative clusters are colored with yellow and red, respectively.The map is made interactively to display the sentiment score and polarity of a state when the mouse is hovered over.Fig. 7 shows an example of the sentiment map of Twitter user @realdonaldtrump.Besides the sentiment map and rankings, there are two charts on the dashboards.One is the sentiment score chart and the other is the occurrence chart.The sentiment score chart reveals the temporal patterns of a chosen term over an extended period of time.Users are able to choose if they want to see the overall pattern in the contiguous United States as a whole or the sentiment trend in a particular state.Fig. 8 shows an example sentiment score chart of hashtag #job in the state of New York.As one can see, the interactive chart displays the date and sentiment value on that specific day when a data point is selected.

V. CONCLUSIONS AND FUTURE WORK
This paper presents HappyMeter, a real-time data processing infrastructure to evaluate public sentiment changes in the context of Twitter.The system examines every term tweeted in the contiguous United States and computes their sentiment scores in the range of 1 through 9. Daily analysis has been conducted throughout the contiguous U.S. as well as in each state.Over 40,000 terms extracted from more than 380 million tweets have been studied.These terms include words, hashtags and user mentions.The system shows the sentiment map and state rankings for each given term.Sentiment charts are automatically generated to reveal the changing pattern of the public sentiment towards a term in the nation or a selected state.
The study also investigates the amount of daily tweets published in a state, as well as its overall sentiment.Interesting findings have been conducted regarding word frequencies, terms with the highest and the lowest sentiment values and the most frequently tweeted words, hashtags and users.The complete analysis results can be made available upon request.
One limitation of the study is that the sentiment lexicon used in the experiment does not cover all the terms.Due to the informal language model on Twitter, new slangs, abbreviations and acronyms are created each day, many of which are Twitterspecific.For example, "twitterati" is a popular term in the Twitter community, which stands for popular users on Twitter.Future work includes designing a mechanism to regularly update the lexicon in order to expand the vocabulary of the dictionary.
Another limitation of the presented work is that context was not taken into account while calculating the sentiment value of a tweet.Our system determines the sentiment by averaging the sentiment score of each unigram.This method performs well in most cases, especially when the data set is at large.But there are times that an average is not able to reflect the true sentiment of a tweet.This is particularly the situation when a sentence is stated as double negative or laid out ironically.In the future, we plan to investigate sentiment scores of n-grams, specifically phrases, in order to achieve results with higher accuracy.

Fig. 2 .
Fig. 2. Average number of daily tweets per 10,000 people in the contiguous United States.

Fig. 3 .
Fig. 3. Average sentiment scores of states in the contiguous United States

Fig. 8 .
Fig. 8. Twitter sentiment score chart of #job in New York State.The last component of the dashboards to introduce is the occurrence chart.In addition to sentiment values, the system also keeps track of the daily appearances of a Twitter term.Similar to the sentiment score chart, users are able to project the diagram on the overall contiguous U.S. as well as each state.Fig.9shows the Twitter occurrence chart of term "trump" in the whole contiguous U.S. The highest point showing in the figure represents a burst of tweet volume mentioning trump.It occurred on November 8, 2016, the day of the U.S presidential election.The system uses Apache Storm to gather real-time tweets and manipulate data.Records of sentiment values are stored in a Radis database.The graphical web interface is developed in Python and JavaScript.

TABLE III .
SAMPLE OF THE SENTIMENT LEXICON

TABLE IV .
NUMBER OF DAILY TWEETS PER 10,000 CAPITA

TABLE VI .
CHANGE OF SENTIMENT SCORES BEFORE AND AFTER HAPPYMETER

TABLE IX .
TOP 10 MOST FREQUENTLY APPEARED WORDS (EXCLUDING STOP WORDS) Average Sentiment Score www.ijacsa.thesai.org

TABLE X .
TOP 10 MOST FREQUENTLY APPEARED HASHTAGS

TABLE XI .
TOP 10 MOST FREQUENTLY APPEARED USER MENTIONS