On the Distinction of Subjectivity and Objectivity of Emotions in Texts

Emotion classification in texts is an instance of the text classification problem. It therefore could apply some existing text classifiers by considering each emotion as a label of the text. However, most of recent works does not differentiate the subjectivity and objectivity of the same emotion in the text. This paper firstly builds some datasets whose labels are emotion, in which the subject and object of the same emotion are considered as two separated labels. Secondly, this paper evaluates some existing classifiers via some scenarios on the built datasets. The results are then discussed on some difficulties of these kinds of problem. Keywords—Text classification; Emotion classification; subjective emotion; objective emotion


I. INTRODUCTION
Emotion classification in text (or emotion recognition from text) is one of popular instances of the text classification in general.It could be applied into several applications.For instances, in the application of consumer sentiment analysis where the detection of emotion in the consumer feedback may help other consumer to choose the best seller or help sellers to improve their services.In the application of home-machine interaction where the computer and the user can take some conversations, recognition of user emotion in the conversation text may help the computer to improve the effect of the conversation.In the application of measurement the similarity among social network users, recognition of emotion in the posts (status), or the comments of users on some other's posts could help us to measure the similarity among users' opinion on the topic.These data may then help us to further analyse and/or predict the similarity of these users on their interests or their behavior of online shopping, etc.
As this is an instance of the text classification problem, it thus could use some existing (and popular) text classifiers.However, most of recent works does not differentiate the subjectivity and objectivity of the same emotion in the text.For instances, let's consider these two following texts: • Text A: I am happy • Text B: They are happy, but me not Most of current text classifiers will assign both these texts to the label of joy because they do not differentiate the happiness of the teller (subjective joy), and that of other (objective joy).Meanwhile, in the case of emotion classification with the distinction of their subjectivity/objectivity, these two texts may be assigned to different labels: the text A may be assigned to the label of subjective joy, and the text B may assigned to the labels objective joy and subjective sadness.Intuitively, the case of emotion classification with the distinction of their subjectivity/objectivity could classify the texts more precisely than the case without distinction of emotion subjectivity/objectivity.But the second case may be more difficult than the first case.This paper investigates in the problem of emotion classification in texts with the distinction of their subjectivity/objectivity.This paper firstly builds some datasets whose labels are emotion, in which the subject and object of the same emotion are considered as two separated labels.Secondly, this paper evaluates some existing classifiers via some scenarios on the built datasets.The results are then discussed on some difficulties of these kinds of problem.This paper is organised as follows: Section II presents some related works.Section III presents the build of datasets of texts labelled with subjective/objective emotions.Section IV presents some preliminary experiments on the built datasets.Finally, section V is a conclusion.

II. RELATED WORKS
Researches on emotion in computer science domain are mainly based on the appraisal and cognitive theories of emotions such as the cognitive structure of emotion of Ortony et al. [29], the cognitive pattern of emotion of Lazarus [13], [19] and the belief-desire theory of emotion (BDTE) of Reisenzein [34].These attempts could be regrouped into three mains directions.First, the approaches to represent the concept of emotions (Van Dyke Parunak et al. [30], and Stephane [37]), to formalize some emotions in a formal logic (Meyer [21], Ochs et al. [28], and Bonnefon et al. [5], [6], [25]), and to calculate the degree of emotions (Steunebrink et al. [38], Nguyen [26]).This direction is far from our work, therefore this paper does not investigate in this direction.
Second, the approaches to recognize the emotion from facial expression (Ekman [11], Russe [36], Adolphs [1], and Busso et al. [9]).This direction is also far from our work, therefore this paper does not investigate in this direction.
Third, the approaches to recognize the emotion from text.This is an instance of the problem of text classification.Therefore, we can use any method of text classification in to the application of emotion detection.For instance, we can use any of (or extended of any ) classical classifiers such as Naive Bayes (NB) [16], Support Vector Machine (SVM) [8], k-Nearest Neighbors (kNN or IBk) [2], C4.5 [33].Moreover, some authors could improve some classical classifier for their model.For instances, Danesh et al. proposed three improvements using Decision Template, or Voting, or Ordered Weighted Averaging (OWA) [10]; Erkan et al. [12] proposed a model with a Harmonic function; Nigam et al. [27] proposed the expectationmaximization (EM); Kibriya et al. [17] proposed the Multinomial Naive Bayes.
Recently, some authors proposed their model for emotion detection in text.For instances, Alm et al. [3] used supervised machine learning with the SNoW learning architecture.Szpakowicz and colleagues [4], [14] used annotation scheme.Li and Xu [20] tried to infer and extract the reasons of emotions by importing knowledge and theories from other fields such as Sociology.Kralj et al. [18] investigated in EMOJIS, emotion expression in Twitter.Perikos and Hatzilygeroudis [31] used an ensemble of classifier: two are statistical (a Nave Bayes and a Maximum Entropy learner) and the third one is a knowledgebased tool performing deep analysis of the natural language sentences.However, most of these models are not tested with the subjectivity/objectivity of the same emotion.Therefore, this paper aims to evaluate some existing classifiers on the problem of emotion classification with the distinction of their subjectivity/objectivity.

III. DATASET
This section presents some related datasets for the problem of emotion classification; and then, builds some datasets which support the classification of subjective/objective emotions.

A. Related datasets
There are many datasets built for the problem of emotion classification.These datasets could be divided into two groups.Firstly, group of single label, in which a text has only one label.For instances, Plutchik [32], CrowdFlower [24], EmoLex [23], and Semeval2017 [22].Secondly, group of multi-label, in which a text may have more than one label.For instance, Brat data [7].These are presented in Table .I.However, most of related datasets do not support the distinction of subjectivity and objectivity of emotion.In these datasets, the default label is that subjective emotion.Therefore, these datasets could not be used for the problem of emotion classification with the distinction of their subjectivity/objectivity.This is the main reason this paper has to build some new datasets to support this problem.

B. Built dataset
In order to build some dataset for the problem of emotion classification with the distinction of their subjectivity/objectivity, we collected several texts from several sources: status on social networks, title of news papers, idioms and quotations, lyric of songs, etc.These texts are then labelled with (subjective and/or objective) emotions.The emotions are mainly based on the cognitive definition of Frijda [13] and Lazarus [19].The texts are divided into two dataset based on language: Vietnamese and English.In the Vietnamese dataset, there are about 1500 samples.Meanwhile the English dataset has about 800 samples.The distribution of samples on each label is presented in the Table .III.And the distribution of samples on the number of label for each sample is presented in the Table.II.These are multi-label datasets: each text may have more than one label.
One of the most important feature of these two datasets is that in their labels, the subjectivity and the objectivity of the emotion are distinguished.A text may have only a subjective emotion, or objective emotion, or both subjective and objective of an emotion.For instances, the text "My dream becomes true!" may have two labels of subjective satisfaction and subjective joy.Meanwhile, the text "His dream becomes true, but not mine!" may have four labels: objective satisfaction, objective joy (for him), subjective disappointment, and subjective sadness (for the teller).In this case, the subjective satisfaction and objective satisfaction are considered as two different labels.Therefore, in these two datasets, there are only 14 different emotions, but there are 28 different labels because of the distinction of their subjectivity/objectivity.

A. Experiment 1: Evaluation of the classifiers
The objective of this experiment is to find out the most suitable classifier for these datasets.The found classifier will be used in the next experiments.
6 Note the observed output parameters for each time of running.
7 Repeat the steps from 5 to 6 in ten times (10-folds) and take the mean values of each output parameters for all times of running.
2) Output parameters: Let's O i , and E i are respectively the original set of label and the extracted set of label of the text i.And We make use of these parameters: • The precision on the sample i is: • The precision on all n samples in the test set is: • The recall on the sample i is: • The recall on all n samples in the test set is: • The F1-score on all samples of the test set is: For each experiment, we consider the results on three output parameters: Precision, Recall, and F1-score.3) Results: The results are presented in the Table .IV:The classifier SVM gets the lowest value on all three output parameters, on both datasets.KNN gets higher value than SVM; C4.5 and RF get higher value than SVM; KNN gets higher value than C4.5; NB gets higher value than KNN.And the classifier MNB gets the highest value on all three output parameters, on both datasets.Therefore, the MNB is the chosen classifier for the next experiments.

B. Experiment 2: The effects of the stop-words
The objective of this experiment is to test the effect of stopwords on the distinguish of subjective and objective emotion in texts.Therefore, this experiment will compare two strategies in pre-processing of data: remove (without) or not remove (with) stop-words from the texts.
1) Scenario: This experiment is taken with the following scenario for each dataset: 1 For each text in the dataset, consider two cases: 1.1 All stop-words are removed.1.2 Do not remove stop-words.
2 Split the remain character sequence into 1-grams.
3 Transform each text into a vector of TF-IDF value.
4 Using the k-folds crossed-validation: Split the dataset into ten sets (10-folds).Each time, a set is used for testing, and the nine remain sets are used for training.
5 Train and test with the classifier of Multinomial Naive Bayes (MNB).
6 Note the observed output parameters for each time of running.
7 Repeat the step from 5 to 6 in ten times (10-folds) and take the mean values of each output parameters for all times of running.
In this experiment, three output parameters are also used: Precision, Recall, and F1-score.2) Results: The results are presented in the Table .V on three output parameters, on both datasets.At the level of precision, the value in the case with stop-words is higher than that in the case without stop-words, on both datasets.Meanwhile, at the level of recall, the value in the case with stop-words is lower than that in the case without stop-words, on both datasets.However, at the level of F1-score, the value in the case with stop-words is higher than that in the case without stop-words, on both datasets.
Based on these results, in the next experiment, all stopwords are not removed from the texts.

C. Experiment 3: The effects of N-gram
The objective of this experiment is to find out the best n in the n-gram extraction of texts.This experiment will consider five gram-extraction strategies: using only 1-gram, from 1 to 2-grams, from 1 to 3-grams, from 1 to 4-grams, and from 1 to 5-grams.
1) Scenario: This experiment is taken with the following scenario for each dataset: 1 For each text in the dataset, remove all stop-words.
2 Each time, using one of these five following gramextraction strategies: 2.1 1-gram: using only 1-gram.
3 Transform each text into a vector of TF-IDF value.
4 Using the k-folds crossed-validation: Split the dataset into ten sets (10-folds).Each time, a set is used for testing, and the nine remain sets are used for training.
5 Train and test with the classifier of Multinomial Naive Bayes (MNB).
6 Note the observed output parameters for each time of running.
7 Repeat the step from 5 to 6 in ten times (10-folds) and take the mean values of each output parameters for all times of running.
In this experiment, three output parameters are also used: Precision, Recall, and F1-score.2) Results: The results are presented in the Table.VI: Generally, the higher the n-gram is up to, the higher the value of output parameters, in both datasets.However, from the value of 3-grams, the increment of output parameters is slowdown and there is no significant difference among three output parameters in the case of 3-grams, 4-grams, and 5grams.Therefore, it is sufficient to use the case up to 3-grams.

D. Experiment 4: The difficulty of the problem
This experiment will compare the case of emotion classification with or without distinction of their subjectivity/objectivity to see how hard the problem of emotion classification with distinction of their subjectivity/objectivity in comparing to the classical problem of emotion classification without distinction of their subjectivity/objectivity. 1) Scenario: This experiment is taken with the following scenario for each dataset: 1 For each text in the dataset, do not remove stop-words.
2 Split the remain character sequence into grams from 1 to 3-grams.
3 Transform each text into a vector of TF-IDF value.
4 Using the k-folds crossed-validation: Split the dataset into ten sets (10-folds).Each time, a set is used for testing, and the nine remain sets are used for training.7 Repeat the step from 5 to 6 in ten times (10-folds) and take the mean values of each output parameters for all times of running.
In this experiment, three output parameters are also used: Precision, Recall, and F1-score.2) Results: The results are presented in the Table .VII on three output parameters.Unsurprisingly, the output values of the first problem, classification of emotion without distinction of their subjectivity/objectivity (classification of 14 labels), are much higher than those in the third problem, classification emotion with distinction of subjectivity/objectivity (classification of 28 labels), on both datasets.And the output values of the second problem, classification of the subjectivity/objectivity only (classification of 2 labels), are much higher than those in the first problem (classification of 14 labels), on both datasets.
There are two reasons to be considered.Firstly, in the case of emotion classification with the distinction of subjectivity/objectivity, the number of label is double than in the case of classical emotion classification.Generally, in the context of classification problem, the higher the number of label to classify, the more difficult the problem.
Secondly, that is the difficulty of the differentiation between the two label of the same emotion, but different subjective/objective. Let's return to the example from the introduction section: • Text A: I am happy • Text B: They are happy, but me not In the case of emotion classification without subjectivity/objectivity, it could be easy to detect that both these two texts are in the label of joy.However, in the case with the distinction of subjectivity/objectivity, the results are totally different: the texts A belongs to the label of subjective joy, meanwhile the texts B belongs to two labels objective joy and subjective sad.We can see the difficulty of classify these texts among two labels subjective joy and objective joy.
These results indicate that the problem of classification emotion with distinction of subjectivity/objectivity is much more difficult than the classical classification of emotion without distinction of their subjectivity/objectivity.Consequently, the classifiers could not reach an average value for output parameters.Meanwhile they could get an above-average value when applying them into problem of classification of emotion without distinction of their subjectivity/objectivity.This could be considered as a challenge for researches in the near futures.

V. CONCLUSION
This paper considered the problem of emotion classification with the distinction of their subjectivity/objectivity.There are two datasets of text labelled with subjective/objective emotions are built and introduced, one in English, another in Vietnamese.This paper also taken some very preliminary experiments to evaluate some current statistical-based classifiers on these kind of problem.The results indicate that there are two different aspects regarding the classical emotion classification problem (without distinction of subjectivity/objectivity): first, using stop-words is better for differentiating the subjectivity and the objectivity of emotion in texts.Secondly, using current statistical-based classifiers such as SVM, KNN, C4.5, NB, RF, MNB could no more helpful in the given problem.These difficulties are our challenges and objectives to work in the near future.

4
Using the k-folds crossed-validation: Split the dataset into ten sets (10-folds).Each time, a set is used for testing (called testing set), and the nine remain sets are used for training (called training set).

TABLE I .
SOME RELATED EMOTION DATASETS

TABLE II .
DISTRIBUTION OF EMOTION NUMBER IN TWO DATASETS

TABLE III .
DISTRIBUTION OF EMOTION TYPE IN TWO DATASETS

5
Train and test with the classifier of Multinomial Naive Bayes (MNB) in three cases: 5.1 Emotion only: Only emotions are differentiated.It means that the subjective and objective of an emotion are considered as the same label, that is the given emotion.For example, subjective joy and objective joy are considered as the same label of joy.So in this case, there are only 14 labels to classify.5.2 Subjectivity/objectivity only: Only the subjectivity and objectivity of emotion are differentiated.It means that the subjective of all emotions are considered as only one label, the same for objectivity.For example, subjective joy, subjective disappointment, and subjective anger are considered as the same label of subjective.So in this case, there are only two labels to classify (subjective and objective).