An Intelligent Metaheuristic Optimization with Deep Convolutional Recurrent Neural Network Enabled Sarcasm Detection and Classification Model

—Sarcasm is a state of speech in which the speaker says something that is externally unfriendly with a purpose of abusing/deriding the listener and/or a third person. Since sarcasm detection is mainly based on the context of utterances or sentences, it is hard to design a model to proficiently detect sarcasm in the domain of natural language processing (NLP). Despite the fact that various methods for detecting sarcasm have been created utilizing statistical machine learning and rule-based approaches, they are unable of discerning figurative meanings of words. The models developed using deep learning approaches have shown superior performance for sarcasm detection over traditional approaches. With this motivation, this paper develops novel deep learning (DL) enabled sarcasm detection and classification (DLE-SDC) model. The DLE-SDC technique primarily involves pre-processing stage which encompasses single character removal, multispaces removal, URL removal, stop word removal, and tokenization. Next to data preprocessing, the preprocessed data is converted into the feature vector by Glove Embeddings technique. Followed by, convolutional neural network with recurrent neural network (CNN-RNN) technique is utilized to detect and classify sarcasm. In order to boost the detection outcomes of the CNN+RNN technique, a hyper parameter tuning process utilizing teaching and learning based optimization (TLBO) algorithm is employed in such a way that the classification performance gets increased. The DLE-SDC model is validated using the benchmark dataset and the performance is examined interms of precision, recall, accuracy, and F1-score.


I. INTRODUCTION
Sarcasm detection in discussions has become ever more popular amongst natural language processing (NLP) scientists with the greater usage of communicative threats on social networking platforms. Natural language is an essential data source of human emotions. Automatic sarcasm detection is repeatedly defined as an NLP problem since it mainly needs to understand the human emotions language, expressions expressed by the non-textual/textual content. Sarcasm detection has gained more attention in previous decades since it facilitates precise analysis in online reviews and comments [1]. As an illustrative approach, sarcasm utilizes word in a manner which differs from the traditional meaning and order as result of misleading polarity classification. The results obtained in this development can be used for information categorization.
Sarcasm could be deliberated as an implied form of emotion. Usually, it transmits the reverse of what has been aimed. Generally, Sarcasm is related to literary devices like satire and wit/irony i.e., utilized for insult, refutes, amuse or make fun of. Specifically, the teacher exclaimed "Credit to your hard work. I have been never impressed more in my lifetime. Lol!" these sentences might expose i.e., gratitude. But, the expression of a speaker and context demonstrate the sarcastic manner of these expressions. In the lack of visible expression, defining sarcasm in Twitter is a challenging one. A stimulating perception of sarcasm has been proposed by [2] in which the analyses were carried out in 2 sarcastic states: all centric and egocentric. The previous terms indicate that the sarcasm was observed/felt only from the participant's point of view and not from addressees" perception and the last one indicates sarcasm being observed from the addressee and participant perspectives. The generic understanding of the result transmits the prosodic feature, the one including pattern of sounds and stress is more useful in identifying sarcasm when compared to contextual features.
Fundamental analyses of sentiment from the text mightn"t be effective for understanding the clear stimulation because of the existence of different literary devices like irony, sarcasm, and so on [3]. Thus, sarcasm detection is highly required for avoiding all kinds of misinterpretation in all kinds of transmission and for ensuring that meaning aimed in the statement is assumed accordingly. Automatically identifying sarcasm could be a difficult task that could be demonstrated by automatic sarcasm analysis and detection. Identifying sarcastic statements becomes an essential process in social networking applications since it effects the organization that mines social networking data. In spite of the existence of several potential features are extracted from text, they could be gathered into major classes, such as contextual, lexical, pragmatic, and hyperbolic features [4]. The fundamental objective of this study is to classify sarcasm into different kinds that aid in understanding the intent to hurt or level of hurt i.e., existing in the sarcastic statements. Because sarcasm may elicit a broad range of feelings in a person, it can either make the receiver www.ijacsa.thesai.org laugh or, in the worst-case it might elicit a deeper sense of emotional harm. The applications of type detection might be effective in understanding the sentiments behindhand sarcasm, which offer a perspective to the sentimental condition of the person engaging in a sarcastic discussion, namely, the one on whom sarcasm was meant and the person who employs sarcasm.
Several machine learning, rule-based, deep learning, and statistical based methods have been stated in related works on automated sarcasm detection in one sentence i.e., frequently based on the content of words in isolation. This involves a variety of methods like multimodal (text image) content [5] sense disambiguation and polarity flip detection in text [6]. Previous research on detecting sarcasm in text includes pragmatic (context) and lexical (content) clues [7] such as sentiments, interjections, and punctuation alterations, which are major indicators of sarcasm [8]. The characteristics in this study are handmade and cannot be generalized due to the presence of metaphorical slang and informal language, which are often used in online communication. Current research [9,10] use NN for learning contextual and lexical features, eliminating the necessity for handmade features with the development DL method. In this paper, word embeddings are used to train recurrent, deep convolutional, or attention-based neural networks to achieve advanced results on a variety of large-scale datasets. This paper develops novel deep learning (DL) enabled sarcasm detection and classification (DLE-SDC) model. The DLE-SDC technique primarily involves pre-processing stage which takes place at different levels. Then, Glove Embedding technique is used for the representation of word vectors. Moreover, convolutional neural network with recurrent neural network (CNN-RNN) technique is utilized to detect and classify sarcasm. In order to boost the detection outcomes of the CNN+RNN technique, a hyper parameter tuning process using teaching and learning based optimization (TLBO) algorithm is employed in such a way that the classification performance gets increased. A wide range of simulations take place on benchmark datasets and validate the results interms of different measures.

II. LITERATURE REVIEW
In Nayel et al. [11], a method that relied on a supervised ML approach named SVM was utilized for detecting sarcasm. The presented method was calculated by an ArSarcasm-v2 dataset. The efficiency of the presented method was related to another method provided to sarcasm detection shared task and sentiment analyses. Kumar and Harish [12] proposed a new method for classifying sarcastic text with content based FS technique. The projected method is composed of 2 phase FS methods for selecting better representation features. In initial phase, traditional FS approaches like MI, IG and Chi-square are utilized for selecting appropriate features subset. The selected feature subset is additionally developed by the next phase. In following phase, k-means clustering process is utilized for selecting better representation features between same features. The selected features are categorized by 2 SVM and RF classifiers. Chatterjee et al. [13] designed features to detect sarcasm by realistic features which considered the context of word. The method is depending upon a linguistic method which defines how human differentiate among various kinds of untruth. Later, they train different ML based classifiers and relate their accuracy.
Razali et al. [14] focus on detecting sarcasm in tweets by combining DL derived features with contextual constructed feature sets. A feature set is retrieved from a CNN framework and carefully combined with the handmade feature set. Those custom feature sets are developed based on their contextual explanation. Every feature set is specifically designed for the solitary task of detecting sarcasm. The aim is to find the optimum features. Few sets are beneficial for working even if it is utilized individually. Other sets aren"t really substantial without integration. The result of the experiment shows positive based on Precision, Accuracy, F1-measure, and Recall. The integration of features is categorized by ML methods for the purposes of comparison. The LR approach is considered as an optimal classification approach for this work. In Rajeswari and ShanthiBala [15], a supervised classification method viz., MNNB is utilized for detecting sarcasm, and SVM is utilized for detecting the types of sarcasm. In this work, the sarcasm is extracted from the twitters using MNNB. The tweets contain noisy messages and are managed well for efficient detection of sarcasm. Additionally, the types of sarcasm are also detected for diagnosing the state of the user.
Zhang et al. [16] proposed the utilization of NN for detecting sarcasm tweets and compared the impacts of continuous automated features using discrete manual features. Particularly, they utilize bi-directional gated RNN for capturing syntactic and semantic data on twitters, and a pooling NN for extracting contextual features manually from past twitters. Akula and Garibay [17] concentrate on identifying sarcasm in textual conversation from different societal and online platforms for networking. Eventually, they developed an interpretable DL method with gated recurrent units and multihead self-attention. The major goal of this work [18] is the sentiment analyses of people's opinions exposed on Face book based on the present epidemic condition in lower resource language. To perform this, they have made a large scale dataset consist of 10,742 automatically categorized commentaries in the Albanian language. Moreover, in this study, they reported the effort on the development and design of sentiment analyses based on DL approach. Consequently, they reported the investigational finding attained from this presented sentiment analysis by different classification methods using static and contextualized word embedding, i.e., BERT and fast Text, validated and trained on these curate and collected datasets. Das and Kolya [19] ,the sarcastic word distribution properties of a common pop culture sarcasm corpus, which includes sarcastic speeches and dialogues, are automatically extracted. Further, they proposed an amalgamation of 4p LSTM, each contains unique activation classifier. Those models are mainly intended to effectively identifying sarcasm from the text corpus.
Sundararajan and Palanisamy [20] aim are to enhance the present methods by integrating a novel perception that categorizes the sarcasm on the basis of the levels of harshness applied. The main application of the projected study will be associating the mood of an individual to the types of sarcasm www.ijacsa.thesai.org shown by him/her that can give main perceptions regarding the emotional behaviour of an individual. An ensemble-based FS approach was proposed for choosing the optimum collection of features for detecting sarcasm in tweets. This optimal collection of attributes was used to determine whether the tweets were sarcastic or not. Afterward identifying the sarcastic sentence, a multi-rule based method was projected for determining the sarcasm types. Kumar et al. [21] used Mustard, a typical conversation dataset to determine the use of an ensemble supervised learning approach for identifying sarcasm. Furthermore, it can be useful in reducing model bias and assisting decision makers in knowing how to use this model accurately. Liyuan Liu et al. [22] Proposed a method called A2Text-Net which combines auxiliary variables to improve the performance of sarcastic sentiment classification.

III. THE PROPOSED MODEL
This study has developed a DLE-SDC technique to classify the presence of sarcasm. The working process is demonstrated in Fig. 1. The proposed method involves different processes namely, preprocessing, Glove based word vector representation, CNN-RNN based classification, and TLBO based parameter optimization.

A. Data Pre-processing
At the first stage, the data is pre-processed to transform into a compatible format. The different sub processes involved in data pre-processing are:  Remove single letter words.
 Remove multiple spaces.
 Remove punctuation marks.
 Remove numbers.
 Remove stop words and.
 Convert uppercase characters into lowercase.

B. Glove based Word Representation
The Glove approach can able to generate a vector depiction of words in the application of similarity between words as invariant. It utilizes 2 different methods as CBOW and Skipgram. The problem related to the conventional methods includes minimum accuracy, maximum processing time, etc. The primary objective of Glove is to incorporate the approaches proposed by 2 techniques wherein optimal accuracy must be assured. In previous to generating Glove approach, the vector depiction of words has been determined. The approaches are employed to generate a vector using standard dimensions (d) for all the words. The approach that employs similarity between 2 words as invariant wherein the words in similar contents is taken into account and show same meaning.
Assume the terms beforehand presenting a formulation of Glove:  Take a matrix of word to word -existence count as denoted by , whereas values store the amount of iterations in a word in a sentence of word  Assume that ∑ represents the amount of times a word could be repeating in content of word  Finally, consider denotes a likelihood of word displayed in context of word Let us take 2 words & i.e., associated with each other in content; e.g., suppose that cricket is a subject matter so that duck and boundary. Analyzing a ratio of coexistence probability with distinct probe words, k, reveals the relationships between those words. In words , i.e., associated with duck by not including the boundary, let thus the ratio is maximalized. Similarly, in words depends on boundary by not including duck, let six, so that ratio is minimalized. Hence, words like score i.e., appropriate to duck and boundary, as ratio is nearly 1. Conventional logic suggests that the proportion of coexistence possibilities might be used as a starting point to calculate the similarity between those terms. The ratio is based on 3 words j, and k, in which standard method simulates the process as follows, Whereas denotes word vector and ̂ represents separate context word vector. As vector spaces are integrally linear structures and assume vector variances.

̂
(2) The application of algebraic function and group theory are given below: ̂ ̂ (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 2, 2022 307 | P a g e www.ijacsa.thesai.org ̂ In which " " represents a dot product between 2 vectors and whereas indicates an exponent function.
̂ (4) and the represents a constant in word , the aforementioned operation is altered by: In which represents a bias for word and ̂ indicates a bias for word Later, the optimum function from ML perception is written as follows: RHS from previous equations are calculated from corpus, which should be updated from LHS to produce relevant RHS. Thus, hypothesis (h) determines LHS, while RHS is referred to as output (y). The cost function is then converted using the least square method: and it is necessary to minimize the cost function. But, previous to employing GD, the cost function should be increase dramatically by some weights to each two words; hence, a cost function may be thought of as a memory that preserves data depending on previously calculated values.
whereas denotes weight associated with -existence of term with . Generally, is called as: Later, when a partial derivative of is handled by as follows:

In which implies a dimension of word
If is a vector for , would be: Therefore, a vector of . Hence, the manner of calculating derivatives is based on a word wherein GD is employed by learning rate alpha , to train module. Consequently, when the module is trained, the words with similar meanings are extracted with producing an arbitrary word as input. In businesses, words with similar meanings such as business, Market, industry, products, share market, stock, etc.

C. Sarcasm Detection using CNN-RNN Technique
The extracted feature word vectors are fed into the CNN-RNN technique for the classification of sarcasm. RNN is a kind of NN which preserves internal hidden state for modelling dynamic temporal behaviour of series using random lengths by directed cyclic relations among its unit. It could be taken into account as a hidden Markov method extension which applies nonlinear transition function and can able to model long-term temporal dependency. LSTM prolongs RNN by including a forget gate for controlling either to forget the present state; an input gate for indicating whether it read the input; an output gate for controlling either to output the states [23]. That gate enables LSTM for learning long-term dependencies in a series, and also facilitates it for optimizing since that gate helps the input signal to efficiently broadcast via the recurrent hidden state without influencing the output. Also, LSTM efficiently handles the gradient exploding or vanishing problems which usually appear in RNN training Fig. 2 illustrates the framework of CNN model.
In which denotes an activation function, indicates the product using gate value, and different matrices are learned parameters. They use the rectified linear unit (ReLU) as the activation function in this performance. A new CNN-RNN architecture is employed for multi label classification problems. It consists of: The CNN extract semantic representation from the image; the RNN models label or image relations and label dependency. The recurrent, label and image depictions are proposed to the similar low dimension space for modelling the label redundancy and the image text relation. Fig. 3 demonstrates the structure of RNN model. The RNN method is applied as a compact but a strong representation of the label co-existence dependencies in this space. It takes the embedding of the predictive labels at every time step and maintains a hidden state for modelling the label's co-existence data. The a priori likelihood of a label provided the previous prediction label could be calculated based on their dot product with the addition of recurrent and image embeddings.  A label is denoted as a one-hot vector [ ] , i.e., 1 at kth position, and somewhere else. The label embedding could be attained by multiplying the one-hot vector using a label embedding matrix The kth row of denotes the label embedding of label (10) The dimension of is generally lower compared to the amount of labels. The recurrent layer takes the label embedding of the previous prediction label, and models the co-existence dependency in its hidden recurrent state by learning nonlinear function: whereas and represents the hidden state and output of the recurrent layer at the time step , correspondingly, indicates the label embedding of t-th label in the predictive path, and , signifies the nonlinear RNN function. The image depiction and output of recurrent layer are proposed to the similar low dimension space as the label embedding. (12) whereas & denotes the prediction matrix for image depiction and recurrent layer output, correspondingly. The column count of & indicates the similar as label embedding matrix represents the CNN image depiction. They would display in second that the learned joint embedding efficiently characterizes the significance of labels and images. Lastly, the label score could be calculated by multiplying the transpose of & for computing the distances among and every label embedding. (13) The prediction label likelihood could be calculated by soft regularization on the scores. Fig. 4 depicts the architecture of LSTM model.

D. Hyperparameter Optimization using TLBO Algorithm
At the final stage, the learning rate of the CNN-RNN technique is optimally chosen by the use of TLBO algorithm in such a way that the sarcasm detection outcome gets increased. TLBO technique is a novel type of metaheuristic approach which is dependent upon teaching-learning model. It can be established by Rao et al. [24] for solving optimization issues. It can be simulated as feeding the knowledge in a class where students initially gain information from teacher and next with mutual interface. The TLBO technique has population based optimized technique where the set or class of students regarded as population. Therefore, the student of class signifies the possible solution of difficulty. The TLBO technique includes two stages as given below.

1) Teacher level:
This level defines the learning of student from teacher. The teachers attempt for improving the knowledge level of student and uses for obtaining optimum marks. However, the student gains information and attain marks based on quality of teaching distributed as teacher and quality of student existing in the class. In order to simulate, supposing there are "n" amount of subjects ( ) existing to " " amount of students (population size, ). In some teaching-learning cycles (iteration, ), represents the mean outcome of students in specific subject "j". The teacher is one of the skilled, experienced, and extremely learned person in society. For simulating this model, an optimum student (possible solution) in total population was regarded as teacher. The variance among the outcome of teacher and the mean outcome of students in subject "j" is provided as: where implies the teaching influence that decided the value of mean that altered and implies the arbitrary number in range 0 to 1. signifies not parameter of TLBO technique and their value is either be one or two [25]. The possible solution (student) is enhanced by moving its places near the place of an optimum possible solution (teacher) by taking into account the present mean value of possible solution. For simulating this detail, the th possible solution in the population at k th teaching-learning cycle is upgraded based on subsequent written as: (15) When implies the superior to , then is recognized; Then it can be rejected. Every accepted possible solution is continued and these developed the input to student phase. 2) Student level: Here, the student gains information with mutual communication. The students interrelate arbitrarily with another student of class for improving knowledge. Therefore, when the student (v) has superior to student (u), afterward student (u) is stimulated near student (v). Then, student (u) was stimulated away from student (v). The learning viewpoint of this phase is provided here. Two students (possible solution, ) are arbitrarily elected from class (population), where , are 2 integers arbitrary number go to [1, ] and .

Else
Endif where implies the fitness function (FF) which is utilized for finding the fitness value of possible solutions, refers the th design variable of altered possible solutions from student level at kth teaching-learning iteration.
Afterward, the fitness value of is estimated.

Else
End if

B. Results Analysis
This section examines the sarcasm detection performance of the DLE-SDC technique against several aspects. The DLE-SDC technique is investigated interms of different measures namely precision, recall, accuracy, and F-measures. The confusion matrix generated by the DLE-SDC technique on the classification of sarcasm is depicted in Fig. 5. The figure showcased that the DLE-SDC technique has classified a total of 14166 instances into non-sarcastic and 12639 instances into sarcastic ones.
A brief classification results analysis of the DLE-SDC technique with other DL techniques takes place in Table I      A loss graph analysis of the DLE-SDC technique is examined under variable number of epochs in Fig. 8. The figure has shown that the training and training loss values get reduced with a rise in number of epoch. Particularly, the validation loss seems to be lower than the training loss of the DLE-SDC technique. Finally, a comprehensive comparative study of the DLE-SDC technique with other techniques takes place in Table II [27].      From the above mentioned results analysis, it is observed that the DLE-SDC technique has accomplished maximum sarcasm detection performance and can be employed to detect sarcasm in online social media content.

V. CONCLUSION
This paper has presented a new DLE-SDC technique to identify and classify the sarcasm using DL technique. The proposed DLE-SDC technique comprises different stages of operations such as pre-processing, word vector representation, CNN+RNN based classification, and TLBO based hyper parameter optimization. Besides, the CNN-RNN technique involves the BiLSTM model of the detection and classification of sarcasm. In order to increase the sarcasm detection performance of the CNN-RNN model, TLBO algorithm is applied to determine the optimal learning rate of the presented CNN-RNN model and it is mainly used to boost the detection performance to a maximum extent. A wide range of simulations take place on benchmark datasets and validate the results interms of different measures. The simulation outcomes pointed out the supremacy of the DLE-SDC technique over the recent state of art techniques. As a part of future work, the sarcasm detection performance can be extended to the design of feature selection and clustering techniques. www.ijacsa.thesai.org