An Ensemble Multi-layered Sentiment Analysis Model (EMLSA) for Classifying the Complex Datasets

—Sentiment analysis is one domain that analyzes the feelings and emotions of the users based on their text messages. Sentiment analysis of short messages, reviews in online social media (OSM), and social networking sites (SNS) messages gives the analysis of given text data. Processing short text and SNS messages is a very tedious task because of the restricted detailed information generally contained. Solving this issue requires advanced techniques that are combined to give accurate results. This paper developed an Ensemble Multi-Layered Sentiment Analysis Model (EMLSA) that exploits the trust-based sentiment analysis on various real-time datasets. EMLA is the combined approach with VADER (Valence Aware Dictionary and sEntiment Reasoned) and Recurrent Neural Networks (RNNs). VADER is the lexicon and rule-based sentiment analysis model that predicts the sentiments extracted from input datasets and it is used for training. The feature extraction technique is term-frequency and inverse document frequency. Word-Level Embeddings (WLE) and Character-Level Embeddings (CLE) are the two models that increase the short text and single-word analysis. The proposed model was applied to four real-time datasets: Amazon, eBay, Trip-advisor, and IMDB Movie Reviews. The performance is analyzed using various parameters such as sensitivity, specificity, precision, accuracy, and F1-score.


INTRODUCTION
Sentiment Analysis (SA) is the process of finding and dividing the opinions of the people expressed in text, voice, and videos. SA, also called opinion mining (OM), is natural language processing (NLP) that finds the emotions behind the body's text [1]. Opinions are expressed in various domains, such as movie reviews, e-commerce reviews, and Twitter reviews. Every day, many people and millions of reviews are generated by social media platforms regarding products, movies, and general topics [2]. An automated system is required to analyze the users' views, opinions, and sentiments. SA mainly focused on finding non-trivial, emotional information collected from various online sources belonging to social media [3]. Sentiment analysis can also be applied to multiple documents and phrases and analyzed single words. Finally, sentiment analysis divides the reviews into three types such as positive, negative, and neutral, based on the text data. Sentiment analysis helps e-commerce applications increase the sales of specific products [4] [5].
Natural language processing (NLP) is mainly focused on two aspects such as human language understanding and generation. It is a challenging task to analyze the natural language with the existing models. Several applications include speech recognition, text analysis, questioning and answering, synthesis of speech etc. [6]. NLP is divided into two significant areas such as sentiment analysis and recognition of emotions. Sometimes these two areas differ based on their aspects. "Emotion detection" is the domain that finds the feelings from the user's expressions like happiness, sadness, and depression. There is a significant connection between "sentiment analysis" and "emotion detection" [7]. From the emotions, the users express their feelings through text, video, and audio.
Sometimes sentiment analysis goes beyond people's opinions and views, such as sad, happy, angry, etc. [8]. Based on the feedback of the user or customer, sentiment analysis is required [9] [10] [11]. This paper describes various sentiment analyses belonging to several domains using deep learning algorithms with the integration of advanced fine-tuned models. The proposed approach focused on finding the sentiment analysis on multiple domains and analyzing the trust-based reviews in the input dataset. The proposed method also focused on aspect, Multilingual and emotion-based sentiment analysis.

II. LITERATURE SURVEY
T. Gu et al. [12] proposed a novel sentiment analysis approach called MBGCV introduced to increase sentiment classification performance. MBGCV combined with various BiGRU, CNN, and VIB models. The proposed model obtained the high-level sentiment features from the given datasets. The real-time review dataset is used to analyze the performance of the proposed approach. M. K. Hayat et al. [13] introduced a DL model combined with the taxonomy-based approach to solving various issues in sentiment analytics. H. Liu [14] describes the comparative study among the lexicon, ML, and DL-based models that solve several accuracy issues. Various real-time datasets are used for experiments and analysis of sentiments. P. Gupta et al. [15] proposed the lexicon-based model that classifies the twitter data about COVID-19. The proposed model analysis the given Twitter data based on medicines, situations, and conditions faced by the users in lockdown time. This model aims to know the positive and negative opinions regarding the lockdown situation and the www.ijacsa.thesai.org performance of the Indian government. Linear SVC is used to classify the data.
A. Elouardighi et al. [16] introduced the lexicon-based model combined with N-grams and TF-IDF model. The proposed approach is applied to comments in the Arabic language collected from Facebook. The data belongs to the Legislative Elections in Morocco in 2016. Several ML algorithms are used for performance evaluation, such as NB, RF, and SVM. Effective sentiment results were analyzed by using ML algorithms. P. Vyas et al. [17] introduced the framework that works on sentiment analysis regarding COVID-19. The proposed framework extracts the positive, negative, and neutral sentiments from the Twitter data and applies various ML algorithms present for classification. R. Khan et al. [18] introduced the deep LSTM model that predicts the sentiment polarity and emotions from the sentiment140 dataset. The accuracy of proposed model is 90.23%, this is very high compare with previous models. A. S. Imran et al. [19] introduced the LSTM model for detection of emotions and sentiments in terms of text messages collected from twitter. The main drawback of this model is lack of accuracy based on several emotions such as bad, good, anger. D. Antonakaki et al. [20] describe several DL models that work on sentiment analysis. The author mainly focused on three areas such as fake news, spam content, and threats messages given on Twitter. The proposed model analyzed the better sentiments based on the result analysis-the Twitter data used for performance evaluation. H. Strobelt et al. [21] proposed a model called as LSTMVIS that process the complex patterns present in various applications. S. Kumar et al. [22] proposed a hybrid recommended system that was applied to the movies dataset. The proposed model, combined with CF and CBF, provides a better recommendation system based on sentiment analysis. The proposed approach analyzed the present trends, people's sentiments, and users' responses. S. Bhatia [23] proposed a novel graph model that analyses duplicate phrases. The proposed approach focused on correcting the sentences by using graphs. To summarize the text and reduce the dimensions, PCA is used. The proposed method achieved better opinions mining based on sentiments.
S. Davis et al. [24] discussed various works on analyzing customer reviews based on E-commerce datasets. The comparative study shows the proposed approach applied to multiple user review datasets. M. A. Tayal et al. [25] submitted an integrated system based on several operations, such as pre-processing approach. Pre-processing is used to remove ambiguity from the given dataset. The proposed method mainly combines Semantic Sentence Similarity with n-gram co-occurrence relations belonging to specific sentences. Finally, the proposed model is applied to several benchmark datasets and analyzes the performances of existing and proposed models. E. Aslanian et al. [26] proposed the hybrid recommender system (HRS) that improves the high accuracy. The proposed approach, combined with the feature relationship matrix and collaborative filtering, was used to solve the cold-start problem. The proposed method achieves better accuracy compared with other existing algorithms. E. Cambria [27] proposed an automated approach for analyzing sentiments based on emotions. The proposed system combines emotions and reviews and gives better performance.
C. Du et al. [28] proposed a new classification approach that classifies the sentiment data using an advanced feature extraction technique. The softmax classifier is adopted to increase the proposed system's performance. The F1-score of the proposed approach shows the high values for two datasets. Maria Giatsoglou et al. [29] proposed a rapid and reliable model that finds the sentiments of different types of people's opinions from other languages. The ML approach combined with the proposed approach applied to text documents initialized by vectors and trained as a polarity classification model. The proposed model is analyzed using four datasets containing reviews in Greek and English.

III. PROPOSED METHODOLOGY
The proposed methodology is developed with various advanced models such as the pre-trained DL model stemming model for pre-processing, TF-IDF for feature extraction, Word-Level Embeddings (WLE) for text analysis, VADER for training and RNN for classification of text data. Fig. 1 shows the step by step process of implementation.

IV. VADER
This paper uses VADER to train the given datasets to analyze the sentiments. It is the lexical database developed by using rule-based sentiment analysis. The lexicon collects the features (e.g., words) classified as positive or negative based on the sentiment polarity. VADER shows the positivity and negativity scores and also the strength of the positive and negative sentiments. The VADER is mainly based on a compound score measured by aggregation of valence scores of every word in the lexicon, find-tuned based on rules, and then normalized between -1 (high negative) and +1 (high positive). Thus this is considered the single uni-dimensional measure of sentiment for a given sentence. 187 | P a g e www.ijacsa.thesai.org Where x = sum of valence scores of constituent words, and α = Normalization constant (default value is 15).

A. TF-IDF (Term Frequency-Inverse Document Frequency)
TF-IDF is a feature extraction approach that can extract highly reputed words in the given documents and reviews. TF mainly measures the frequently appeared mentions in the given input datasets. The term frequency refers to the total time that appeared in the given input datasets, while the document frequency refers to complete documents that contain the word. IDF counts the word from papers or reviews divided by the phrase "document frequency." Every word initialized the score by measuring the TF by its IDF. Here features mean repeated words from multiple reviews from multiple documents.

B. Word-Level Embeddings (WLE)
WLE's are encoded by using column vectors within the embedding matrix . Every column belongs to WLE of kth word in the vocabulary. By using matrix-vector product, the word W transformed into WLE . ( Where is size of vector which is value 1 at index w and 0 in all other portions.
. That learns and WLE size is given as is hyper-parameter which is selected by the user.

C. RNN
All these layers are fully connected and are not associated with each other. RNN [30] performs better on the text sentiments dataset, and all the tasks involve sequential inputs. RNN considers one piece of information at a time and maintains the hidden units of a "state vector" consisting of data regarding the previous history based on the sequence. The outputs of hidden units are considered at various discrete time steps if the results of several neurons in a deep multi-layer network, this becomes easy to implement back-propagation to train RNN. RNN is a dynamic approach, and it is mighty to prepare them and solves the issues in back-propagated gradients either grow or shrink at each step; several times, this process is typically terminated as shown in Fig. 2. The neurons in the hidden layers get the inputs from previous layers based on the time steps. Based on the above process, the RNN maps the input sequence elements represented as x t , and the output sequence represents the elements with o t , dependent on the previous x t′ (for t′ ≤ t). Similar metrics such as U, V, and W are utilized at every step. The back-propagation approach measures the unfolded network on the right and computes overall error based on general states s t and all the metrics.

1) Amazon dataset:
The Amazon dataset consists of testing and training data. The training data contains 3Lakh, and testing data consists of 4Lakh data belonging to 568,000 customer's data. All these customers give reviews of the products. This is open source and free dataset collected from Kaggle: https://www.kaggle.com/datasets/bittlingmayer/amaz onreviews?resource=download.
2) Ebay dataset: A data science Bootcamp project created the eBay dataset. This project aims to develop the best model for sentiment analysis. The author created this dataset using python web scraping scripts for the research work. This dataset consists of two files as ebay_reviews.csv file consists of four attributes: product category, title review, content review, and rating. The total instances are 44757. The rating attribute represents the integer value with one as the worst score and five gives the best score. The second file is a preprocessed file that consists of two attributes: rating, title review, and content review. The dataset available at: https://www.kaggle.com/data sets/wojtekbonicki/ebayreviews/discussion?select=ebay_revie ws.csv.
3) Trip-advisor: This dataset consists of 20k reviews of various hotels given by customers. Trip-advisor extracts these reviews, and it is available on the Kaggle website. The dataset available at: https://www.kaggle.com/datasets/andrewmvd/ tripadvisorhotel-reviews IMDB Movie Reviews Dataset: This dataset consists of 50k movie reviews and this contains 40k testing and 10 k training data. IMDB movie review dataset consists of 25k positive and 25k negative reviews and this data available at: Result Analysis: From the results it is analyzed that the performance of various existing and proposed algorithms are given in Table I to IV. The comparative performance of RTA, IDER and EMLSA is implemented with four datasets. The proposed model EMLSA performed better on all the datasets by analyzing sentiments compared with existing models. The performance is measured by confusion matrix measures such as precision, accuracy, recall and specificity (see Fig. 3 to 6)

VII. CONCLUSION
Even though a conclusion may review the main results this paper describes the new DL model that can process complex datasets based on the reviews given by the users. The proposed approach was applied to four benchmark datasets that show the comparative performance in terms of sensitivity, specificity, accuracy, precision, and f1-score. The proposed DL model focused on extracting every aspect of the input reviews. The word embedding models TF-IDF, and Word2Vec combined with the DL model give high performance in terms of given input datasets. The proposed model achieved an accuracy of 98.45% for the amazon dataset, 97.78% for the TripAdvisor dataset, and 99.56% for the ebay dataset and for IMDB dataset the accuracy is 99.8%. Thus it is shown that the accuracy is more for the proposed model. In future, the multi-layered models are to be developed by improving the sentiments and emotion detection. Various combined and integrated models are required to increase the performance.