Emotion Analysis of Arabic Tweets during COVID-19 Pandemic in Saudi Arabia

Social media has emerged as an effective platform to investigate people’s opinion and feeling towards crisis situations. Along with Coronavirus crisis, range of different emotions reveal, including anger, sadness, fear, trust, and anticipation. In this paper, we investigate public’s emotional responses associated with this pandemic using Twitter as platform to perform our analysis. We investigate how emotional perspective vary regarding lockdown ending in Saudi Arabia. We develop an emotion detection method to classify tweets into standard eight emotions. Furthermore, we present insights into the changes in the intensity of the emotions over time. Our finding shows that joy and anticipation are the most dominant among all emotions. While people express positive emotions, there are tones of fear, anger, and sadness revealed. Moreover, this research might help to better understand public behaviors to gain insight and make the proper decisions. Keywords—Emotion analysis; Arabic tweets; COVID-19; Twitter; Lexicon-based


I. INTRODUCTION
Coronavirus disease 2019 (COVID- 19) is an illness assignable to a novel coronavirus that defined as severe acute respiratory syndrome coronavirus 2-SARS-CoV-2 [1]. It was first identified back to November 2019 at wholesale fish and seafood market, also known as wet markets in Wuhan City, Hubei Province, China. It was informed to World Health Organization (WHO), China country office on December 31, 2019. On 11 January 2020, WHO announced that Chinese authorities did not find a clear evidence of human-to-human transmission 2019-nCoV [2]. With the rapid increase of confirmed cases of COVID-19 on January 30, 2020, WHO stated the COVID-19 outbreak a global health emergency. Unfortunately, when WHO finally declared COVID-19 a global pandemic on March 11, 2020 [3], the number of confirmed cases has been growing exponentially and the novel coronavirus was spread across the world. In line with the precautionary measures that were taken by China to control the outbreak, WHO has recommended social distancing and self-quarantine [4]. China government locked down many affected cities and noticed that the infected cases started to decrease to zero-new-case in March 18, 2020. Many countries followed Chinese government procedures and forced lockdown for 21 days, which was the biggest isolation has ever occurred in the world [4]. The first case of coronavirus infection in Saudi Arabia announced by Ministry of Health (MOH) on March 2, 2020. Since then, Saudi Arabia took a firm precautionary and preventive measures in order to combat COVID-19 outbreak and limiting its spread among the citizens and residents across the country [5]. The economic, psychological, and social impact of the full lockdown was very bad, and countries had to make the difficult decision of a gradual reopening [6], [7], [8], [9].
There is no doubt, nowadays social media platforms are considered as one of the best possible sources to analysis and detect human emotion [10], [11]. During the lockdown, and shortly after the early outbreak, social media platforms such as Twitter has become the source of information for many people on several subjects related to the COVID-19. This encouraged researchers to study and analysis people's reactions on Twitter about this global pandemic and its related implications [4], [12], [13], [14]. [4] The study analyzed positive and negative sentiment related to COVID19 tweets on twelve different countries between March 11 and March 31. Authors in [12] proposed a statistical analysis model on tweets during the February and March. They found that there was increasing in number of individual users tweeting about coronavirus, Covid-19, and Wuhan. In [13], the analysis shows that the Indian people felt positive to their governmental lockdown decision. [14] performed sentiment analysis on coronavirus tweets and study the evolution of fear over time.
Most of the research studied the crisis of COVID-19 through the outbreak. Moreover, emotions have also been studied, but with limited range. In this research, we present textual analysis of Arabic tweets to detect public emotions in Saudi Arabia of the lockdown ending phase. Our proposed method classifies and quantifies tweets according to eight emotions, namely: anticipation, anger, disgust, joy, fear, surprise, sadness, and trust. One of key contributions of this research is developing a system that can label and score Arabic text according to the standard emotions categories. Another key is analyzing the perception of Saudis' people towards COVID-19, and giving insight into their feeling and reactions. Our finding demonstrates that joy was the most dominated among the rest of emotional tones, while anticipation dominated later. Also, we gain insight into the changing of emotions intensity overtime.
The rest of this paper is organized as follows. Section 2 reviews related work and Section 3 introduces our dataset and preprocessing. In Section 4, we state our emotion detection method. Results and discussion presented in Section 5, followed by conclusion in Section 6 II. RELATED WORK People are sharing in social media like Twitter their thoughts which express their moods, emotions, and sentiments. Twitter is a fertile soil that helps researchers to analyze and understand individual's attitudes and behaviors. Tweets can be categorized into two types that convey information about users' mood state; personal tweets and sharing information. Extracting tweets for a given period can reflect changing in the state of general mood. It is much easier to understand moods through facial expression and gestures, voice's tones than written words [15]. Analyzing these written words have captured the attention for psychologists and social scientists [16], [17], [18]. Unlike sentiment analysis (SA) that classifies a text as positive, negative, or neutral, emotional analysis (EA) studies how to detect emotion conveyed in texts (e.g. sadness, happiness, optimism, etc.) [19]. EA can detect greater emotion than the eight-fundamental emotion [20]. The existing studies focused on sentiment analysis more than detecting emotion in text [19]. In the following section, we provide brief review of related work on EA.
Studies [21], [22] investigated and detected depressive disorders on Twitter's users. [21] Analyzed personal updates for 69 users has shown that signals for identifying the depression in individuals who using words related to negative emotion and anger in their tweets. [22] built a statistical classifier to estimate depression, and their model able to predict depression with an accuracy about 70%. In [15], authors studied the impact of major events such as social, economic on public mood, and used an extended version of profile of mood states namely a psychometric instrument to obtain six mood dimensions; tension, depression, anger, vigor, fatigue, confusion. [23] achieved 85% accuracy using a rules-based approach to classify a big data tweets into four classes of emotion. [24], [10] used lexical approach to detect emotion. Authors in [24] built a large lexicon for primitive emotion: joy, surprise, disgust, anger, sadness, and fear. However, they used a manually created training set, and their classifier was limited on news headlines. Whereas [10] built a lexicon of more than 200 moods. Researchers in [25] manually annotated emotions, and sentiments in a limited size sentence of news articles. The study [26] used unsupervised learning approach which convolutional neural network architecture in order to detect and identify emotion in Twitter messages. The authors of [27] classified emotion in social media text into six emotion (happiness, sadness, fear, anger, surprise and disgust) in two steps. First, they extracted emotion using natural language processing. Then, they used support vector machine (with training accuracy 91.7%) and J48 classifiers (with training accuracy 85.4%) of 900 tweets and created a large amount of words that described both emotion and word intensities.
There are many factors affect detecting and analyzing emotion in texts including spelling mistakes, using emoticons, and slang expression [28], [29]. Not to mention the difficulty of type of language used and its complexity such as Arabic language during the analyzing and preprocessing phase. Thus, there are not many studies on analyzing and detecting emotion in Arabic tweets. Authors in [30] proposed a model using Waikato environment for knowledge analysis that categorized Arabic tweets into four emotion sadness, joy, disgust, and anger. Their results achieved 80% accuracy. [31] applied a binary classifier to detect Arabic irony tweets, and the results achieved 72.76% accuracy. The study [32] conducted ensemble learning methods that classified Arabic tweets into five emotions: sports, politics, culture, general topics, and technology. The results show that ensemble methods outperform other classification models such as decision tree model, Naïve Bayes, and sequential minimal model.
In fact, most of the work in Arabic tweets analysis consider sentiment analysis. However, our work performed emotion analysis on Arabic tweets to detect people emotion and identify changes in their moods.

III. DATA COLLECTION AND PREPROCESSING
The study was started with data collecting, cleaning, and preparation process, which we explain in the following.

A. Dataset Collection
The data were obtained by using Tweepy [33], which is a Python library for accessing the Twitter API. We collected Arabic tweets from July 1 to July 31 of 2020, applying multiple key words as shown in Table I to ensure a corpus covers COVID-19 crisis. We collect 1,828,229 Arabic tweets, then the dataset was filtered to focus on tweets belonging to Saudi Arabia, to result in 600640 tweets. These tweets reflect the discussion around the coronavirus after lockdown and quarantine end.

B. Data Preprocessing
The collected tweets may contain lots of noisy and an uninformative data. Keeping these data make the analysis and the classifying complex and inaccurate. Therefore, we applied the following preprocessing steps to clean the raw text.
• Remove tweet features: features such as hashtags, mentions (@user), URLs, and retweet symbol (RT) do not have any impact on sentiment classification as reported in [34], [35]. We removed these unsentimental features in the tweet.
• Replace the repeated letters: users sometimes used a repetition of letter in the same word such as ( ) to highlight some words or feeling. Removing these letters is very important to identify the word in classification process. We replaced the repeated letters with one letter.
• Remove Stop Words: stop words usually filtered out because they are considered as neutral polarity and are not useful to the polarity decision [36]. Arabic stop word such as ( ) removed using Python NLTK stop words.
• Remove repeated tweets: through extracting process for the tweets, API may return duplicate tweets. We removed them to avoid giving extra weight for specific tweets [37].
• Replace emoji with special tokens: Twitter users today often use emojis to express feeling, moods, and emotions [38]. Since our data contains emojis, we treat each emoji as token using the emojis sentiment lexicon from [39].

IV. METHOD
In this research, we explored people emotions regarding coronavirus COVID-19 using tweets data. We developed emotion detection algorithm to classify Arabic tweets into eight emotion categories, namely: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust as considered in NCR [40]. Also, we included the nonemotion sentiment neutral. We used NCR lexicon for Arabic language that includes the English words, the translation of the English words to Arabic language, the emotion for which the intensity score is provided, and emotion intensity score of the word. This lexicon contains 9922 words distributed over these emotions. Our emotion detection approach uses Natural Language Processing (NLP) to score and classify tweet according to the emotions exist in it. The proposed method is based on lexicon-based approach, which calculates the sentiment of a text using the polarity of the words or phrases in that text [41].
Our emotion detection algorithm uses the tweets after the preprocessing step. Programing codes are developed in Python for comprehensive analysis. This approach includes the following steps: • Annotating the emotional words: First, we used the Natural Language Toolkit (NLTK) library in Python [42] to tokenize the tweets into tokens. Then, the tokens are checked against the words in the NCR lexicon. Through the matching process, the only tokens that are annotated as emotion word are considered. While, the named tokens that represent person, location, or time are considered not an emotion word. All the matched words with their associated emotion and wights are stored for scoring step.
• Scoring: Every stored word as weight and emotion category contribute in the scoring process. For each tweet, the emotional score for each category is calculated by summing over all weights associated with the words belongs to that category divided by the number of the words for that emotion as demonstrated in (1), where E represents the emotion category, j=1,2,3,. . . 8 is the category's number, Emoweigh is the weight of the word, and nw number of words in the tweet belongs to j category.
EmoScore(E j ) = j Emoweight nw (1) • The highest score among all categories is assigned to the tweet and classified with its emotion. Furthermore, the proportion score ProScore is calculated as shown in (2) to predict the percentage of each emotion in a tweet.
If there are no emotion words in the tweet, the scores of all emotion categories will be zero and the tweet will be classified as natural.

V. RESULTS AND DISCUSSION
This section deals with the visualizations and analysis of our finding regarding Coronavirus Tweets data.
A total of 600,640 tweets were analyzed. Table II shows the counts and percentage of the eight emotions. Our finding shows that the tweets demonstrated trend toward positive emotions with notable tone of joy and trust. While, the most notable negative emotions were fear and anger.  Fig. 1 illustrates the counts of the emotional tones. The tone joy registers the highest percentage 15.9 %, followed by anticipation with percentage 14.97 %. Fear and trust show very close values which are 11.78 % and 11.06 %, respectively. The low percentage registered by anger, disgust, and sadness, where surprise has the lowest. The finding implies that people express their feeling of joy about lockdown ending, might they eager to back to the lives they had before. They might feel a huge relief because they can do what they miss over the past few months. The second highest value indicates that people try to express their anticipation feeling because they are uncertain about what the future holds and hard to predict the virus spread.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020 Moreover, after lockdown, people might be experiencing a range of emotions because new cases are registered daily In our analysis, we also considered the comparison of each emotional tone relative to other tones. The ratio of emotional tone's count relative to the total counts for the other tones is calculated. The colored part of each bar in Fig. 2 is the proportional to the total of all parts. The analysis is done for each day of July, we observed that joy emotion was the most dominant among all emotions in the beginning and in the medial of the month, followed by anticipation. Whereas, the end of the month the anticipation emotion was dominate. The levels of fear were higher than anger as demonstrated in the figure.
In addition, we also investigate emotions over time, Table  III reflects the daily scores for each emotion. we mapped people's emotions against time as displayed in Fig. 3. The emotional tones of anger, fear, and surprise increase over time and that reflects the people's feeling about the crisis's situation because the vaccine against the COVID-19 is not found yet. On the other hand, trust decreases might because the number of cases does not decrease as time goes. Joy is relatively constant over time, and that is understandable as people got back to their normal life. While anticipation, sadness, and disgust have steady curves during the time period.
In summary, this study giving insights for COVID-19 pandemic using Twitter data to explore public feeling. The study only considers tweets in Saudi Arabia. However, the method can be adopted to study crises or pandemic in the other cultures.

VI. CONCLUSION
Public emotional responses are dynamic and might change during the crisis. People may experience different emotions at different stages of the crisis. During lockdown stage, there has been some analysis for people reacting toward the virus using social media. Moreover, most of research works have study Twitter data for sentiment analysis. Whereas a few studies focused on emotion analysis. In this research, we focused on emotion mining during lockdown ending stage. We analyzed emotions of Twitter posts during COVID-19 crisis in Saudi Arabia. We proposed an effective emotion classification method to explore textual data supported with necessary data visualizations. Our finding implies that joy dominated among other emotions at the beginning, but anticipation dominated later. Even though, people show positive emotions, they also express levels of negative emotions because of uncertainty of the information about the behavior of virus. In addition, interesting observation were noticed about the dynamic of the emotions over time. Thus, this study presented valuable informational and public emotional insights that could be used to understand and study the sociocultural system. It could be useful as well in developing strategies dealing with the dynamic of the emotions associated with the crisis. Our research regarding Covid-19 continue, we are developing a model to discover the frequent pattern in the tweet.  www.ijacsa.thesai.org