Evaluation of Sentiment Analysis based on AutoML and Traditional Approaches

—AutoML or Automated Machine Learning is a set of tools to reduce or eliminate the necessary skills of a data scientist to build machine learning or deep learning models. Those tools are able to automatically discover the machine learning models and pipelines for the given dataset within very low interaction of the user. This concept was derived because developing a machine learning or deep learning model by applying the traditional machine learning methods is time-consuming and sometimes it is challenging for experts as well. Moreover, present AutoML tools are used in most of the areas such as image processing and sentiment analysis. In this research, the authors evaluate the implementation of a sentiment analysis classification model based on AutoML and Traditional approaches. For the evaluation, this research used both deep learning and machine learning approaches. To implement the sentiment analysis models HyperOpt SkLearn, TPot as AutoML libraries and, as the traditional method, Scikit learn libraries were used. Moreover for implementing the deep learning models Keras and Auto-Keras libraries used. In the implementation process, to build two binary classification and two multi-class classification models using the above-mentioned libraries. Thereafter evaluate the findings by each AutoML and Traditional approach. In this research, the authors were able to identify that building a machine learning or a deep learning model manually is better than using an AutoML approach.


I. INTRODUCTION
Machine learning (ML) is a subset of Artificial Intelligence (AI) and it provides a system to learn automatically. Hence the ML becomes a fast-forwarding application and research development area, there were several new libraries introduced to make the developers' life easier. Moreover, present most industries use machine learning to make their customers', users' lives easier. As an example for this, in 2025 researchers predict that revenue of the AI and machine learning-related enterprise application market would be nearly thirty-one thousand millions of US dollars, Fig. 1 shows the revenue generated and expected from machine learning and AI-related applications [1].
Moreover, there were more than 2000 researchers done research related to the machine learning area [2] . With those statistics, it is clear about the importance of machine learning and the ability of the students and developers to enter the machine learning area. Hence, the machine learning area is an area that is updated day by day. At present, one of the most dominating and focuses gained field in machine learning would be Automated Machine Learning (AutoML).
AutoML plays a new era in machine learning and it is currently an explosive subfield with the combination of machine learning and data science. It provides a set of tools to reduce or eliminate the necessary skills of a developer to implement machine-learning models [3]. This AutoML concept was derived because developing a machine learning model by applying traditional machine learning models needs lots of skills, time-consuming, and it is still challenging for experts as well [4] Moreover it is important to note that the Automated Machine learning method was started in the 1990s for commercial solutions by providing selected classification algorithms via grid search [5].
According to the statistics currently, AutoML is been used by almost all who are involving with data science and machine learning area such as domain experts students, and also governments. People who haven't any knowledge in machine learning, students, or beginner developers are mostly using these AutoML libraries. Fig. 2 gives a clear explanation about how the AutoML usage was distributed with the experience level of the students and employees who are working with machine learning and data science [6].
From these statistics, it can clearly understand there are more than 20% of developers and students who have less experience used AutoML libraries. Because of this, there are several automated machine learning libraries were introduced recently as well. These machine learning libraries were introduced for normal machine learning algorithms and deep learning. Among those TPot, HyperOpt Sklearn, and AutoSckit Learn libraries are very popular. These are the automated versions for the well-known machine learning library Sckit learn [5]. Moreover, for the well-known deep learning library Keras there is an automated library named Auto-Karas [7] Present AutoML is used in various areas in data science and machine learning such as image classification, sentiment analysis, and many more. The main goal of this research work is to evaluate the sentiment analysis based on AutoML and Traditional approaches.
Apart from these libraries, there are several cloud-based AutoML platforms. Google provides their own automated AutoML platform in Google cloud platform named Google AuotML [8]. Moreover, Amazon Web Service (AWS) provides are code-free AutoML platform named AutoGluon [9] Present AutoML is used in various areas in data science and   Because of the rapid growth of the AutoML, this is also used for sentiment analysis projects. This research is evaluated by implementing a sentiment analysis model based on AutoML and Traditional approaches. For that authors chose Python as the main programming language. Here, as the machine learning libraries, authors used Scikit Learn and its automated libraries TPot and HyperOpt SkLearn for the investigation. Moreover, this research evaluated the deep learning approaches as well by using the well-known deep learning library Keras and its automated library Auto_Keras. In the implementation, the authors chose the COVID-19 Tweets dataset, Trip Advisor hotel reviews data set, Spam Message data set, and IMDB Movie Reviews data set which are available in Kaggle to build sentiment analysis based classification models using the abovementioned libraries. The evaluation of those models was done by comparing the AutoML and Traditional approaches.
Finally from this research. The researchers hope this would be a very useful evaluation for the newcomers to the data science field and students who hope to use AutoML libraries for their projects and this would be a good evaluation to get a proper understanding of the importance of knowing the fundamental knowledge of machine learning. In the next sector, it will discuss several past research works related to the authors' work and how the proposed work differs from those.

II. LITERATURE REVIEW
AutoML is one of the highly focused research areas in Machine Learning. Because of the AutoML, there is considerable growth and interest in doing machine-learning applications. However, the AutoML is a newly introduced evolving technology and lots of researches are being conducted in this particular area hence there could be several pros and cons that could be identified in each AutoML library. Moreover, when using AutoML libraries for each sector such as Image Classification, Time Series based Predictions, or Sentiment analysis, those libraries performance can be varied. Therefore, researchers did several researches to evaluate these AutoML libraries.
A research study named Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools by several researchers from the USA did a great evaluation of AutoML tools used in present. In this study, they investigate AutoML tools like TPot, Auto weka, and several other tools. Moreover, they evaluate AutoML platforms in clouds such as H2O AutoML, Google AutoML. Here they evaluate these libraries by implementing binary-classification, multi-class classification, and regression models for different data sets. As the results, they noted that most AutoML tools obtained reasonable results and performance [11]. Testing the Robustness of AutoML Systems is research by two researchers from the University of Helsinki. There they evaluate the robustness of three main AutoML libraries for image processing models. For the investigation, they used TPot, H2O, and Auto-Keras libraries. For the implementation, datasets, they used two datasets, which contain digits and fashions. As the result, they were able to get more than 80% accuracy rate from all the AutoML libraries with cleaned data. The table in Fig. 4 summarizes the data accuracy percentages they got in each step of the training [12].
A Journal article named AutoML: Exploration v.s. Exploitation was done to investigate whether AutoML libraries are able to achieve better performance when choosing the most promising classifiers for the given data set. For the implementation, the authors of this article used Auto SkLearn, TPot, and ATM libraries. However, as the results, they mentioned that empirical results across those libraries show that exploiting the most promising classifiers does not achieve a statistically better performance [13].
'AutoML: A survey of the state-of-the-art' is another journal article that discusses the performance of the NAS (Neural Architecture Search). NAS can consider as a subset of AutoML when building deep learning models [14]. In the implementation, they got high perplexity values for the automatically generated models when comparing to the humanmade models on PTB (Penn Treebank) data set. Further, explained when the perplexity value is high the model's accuracy is low. Fig. 5 shows those results got by them for each automatically generated and human-made model [15]. Sentiment Analysis on Google Cloud Platform is a research project, which was done using Google Natural Language API and Google AutoML. The datasets are gathered from Kaggle. As the result, they got nearly 57% accuracy from the google Natural Language API and 90% accuracy from the Google AutoML platform [16]. A web article named Machine Learning -Auto ML vs Traditional methods did an evaluation between python code-based and SAC smart predict system, which is a code-free approach. For the evaluation, the Scikit-Learn library and the Random Forest algorithm were used for the python code-based approach [17].

Another research work named An Open Source AutoML
Benchmark introduced an open, ongoing, and extensible benchmark framework that follows best practices and avoids common mistakes. As the open-source AutoML, tools for the benchmark this method used Auto-Weka, Auto SkLearn, TPot, and H2O AutoML libraries. Moreover, for the testing, they used 39 datasets. The main aim of this research work is to compare the accuracy and the performance of each AutoML open-source library. Finally, the findings of the research published their benchmark results as a web-application [18].
So these are some recent competitive works done relevant to the proposed research work. When concentrate on the main goal of this research study, it needs to evaluate the sentiment analysis models based on AutoML and Traditional approaches. Moreover, for the investigation, this research used machine learning automated libraries and deep learning automated libraries. In the next sectors, it will discuss the methodology of the research work, the unique features of it, and the final results gained by the analysis.

III. METHODOLOGY
The proposed research work is to evaluate the AutoML and Traditional code-based machine learning on sentiment analysis. For the implementation, the research chose Python as the main programming language since it supports various AutoML libraries. This research work evaluates both machine learning and deep learning AutoML libraries for sentiment analysis. For the evaluation, the researchers chose two main automated machine-learning libraries, which are TPot and HyperOpt Sklearn, and one automated deep learning library, Auto Keras for deep learning. Since these automated machine-learning libraries are based on well-known Scikit learn library the research used Scikit lean and since Auto-Keras is based on Tensorflow Keras deep learning library authors chose Keras to build traditional code-based models.
For the evaluation, the authors implement models for both binary and multi-class classification in sentiment analysis. For that this research used four main datasets. From those, Covid-19 tweets data set [19], Trip advisor hotel reviews [20] data set are used to implement multi-class classification models and Spam message [21], IMDB Movie reviews [22] data sets are used to implement binary classification models. Table I summarizes the datasets and libraries used in this evaluation.
When concentrate on the implementation process as the initial stage this research pre-processed the data by removing null values and special characters. Thereafter authors create word vectors for the texts. When creating word vectors in machine learning models this research used Sckit Learn Tfidfvectorizer method and Python NLTK word_tokenize methods to tokenize the texts and calculate the tf-idf scores in the vector. Moreover, in deep learning models, it used Keras Tokenizer to tokenize words and to make all tokenize sequences into the same length, the Keras pad_sequences method was used.
Thereafter authors build several classification models using Scikit learn and did hyperparameter tunings and record the results got by those. When building the classification models using Scikit Learn Library manually, five main classification algorithms were used. They are XGBClassifier, Random forest classifier, LGBMClassifier, Logistic Regression classifier, DecisionTree Classifier, and. Thereafter using AutoML libraries researchers build Automl models for the same dataset. When building the automated machine learning models authors www.ijacsa.thesai.org allows the library to do further pre-processing. When building the Automl models it trained these Automl models for 100 iterations. For the evaluation of these automated and traditional machine learning models, mainly accuracy and other evaluation techniques like precision, recall, and f1-score were used.
Using the same approach the research builds deep learning models for each dataset using Keras and Auto-Keras. When traditionally building the deep learning models, optimization was manually done by tuning the Hyperparameters. When building these deep learning models mainly Keras LSTM, GlobalMaxPool1D, and Embedding layers were used. In the training process of the Auto-Keras models, also authors trained them for 50 to 100 trials. Finally, for the evaluation of the model, the accuracy and the loss of the model in training and validation times, and other evaluation methods used in classification models were used.  Fig. 6 describes the flow of the evaluation process for one data set. As a special point when an automated machine learning or deep learning library builds a model, for further evaluation researchers fine-tune that model by manually implementing it using the model parameters output by the AutoML library because in AutoML tools as the accuracy they output the accuracy for overall training, not for the best model. So this would be the way that the implementation process of the research will continue. This will be a good evaluation to identify the pros and cons of the AutoML vs Traditional approaches of Machine learning. In the next sector, it will discuss the data and results of the research gained in the evaluation process.

IV. DATA AND RESULT
In the implementation, process researchers mainly built two binary classification and two multi-class classification sentiment analysis models using both AutoML and Traditional approaches. As the datasets for the implementation, the research got four main data sets from well know dataset providing site Kaggle. Table II describes each dataset and the number of data available on those. Moreover, in the evaluation process, these datasets were split 70% for training and 30% for the testing. As mentioned in the Implementation sector authors build binary and multimodel classification based on those AutoML and Traditional approaches. When concentrating on the results gained from those as a summary, it can be identified that traditional human-made models perform well and they have high accuracy rates when comparing to the AutoML models in most of the cases. As a special point, the authors identified that when choosing the ML algorithm AutoML libraries are not able to capture the most suitable approach in some cases.
When concentrating on the results got for multi-class classification models using AutoML and Traditional approaches for two datasets used researchers were able to get a 61% accuracy percentage for Trip Advisor Hotel Reviews data set using Logistic Regression Classifier algorithm. Moreover, the deep learning model gets an 82% accuracy percentage. For the AutoML models, it got 75.2% accuracy from the Decision Tree Classification algorithm using TPot and 73.5% from the Random Forest algorithm using HyperOpt Sklearn libraries. Moreover using Auto-Keras as the automated deep-learning library researchers achieved a 74% accuracy rate.
For the Covid-19 Data set which is also used to evaluate the multi-class classification, the research got 71% accuracy from the XGB classifier. And using deep learning researchers got a 72% validation accuracy percentage. Using AutoML libraries it was 60.6% and 60.7% accuracy percentages from the Random www.ijacsa.thesai.org Forest classification algorithm when using HyperOpt-Sklearn library TPot library respectively. When using Auto-Keras as the automated deep learning library authors achieved 57.2 accuracy within 50 trials.
In the implementation, the binary classification in sentiment analysis is also evaluated. When concentrating on the results gained for those datasets, the Spam Message dataset got 97% accuracy from the RandomForest algorithm and 97.6% validation accuracy from deep learning when implemented the models manually. When using the AutoML libraries it got 93.4% accuracy from TPot and 86.7% accuracy from HyperOpt Sklearn library. From Auto-Keras researchers got 89.2% validation accuracy.
Moreover, the IMDB movie rating dataset got 89.7% accuracy from the Logistic Regression classifier and 89.9 validation accuracy from the deep learning model, which create manually. As the results got from automated machine learning from Tpot this research got 76.4% accuracy and 72.1% accuracy from HyperOpt-SkLearn. Here the decision tree classification and random forest classification algorithm were output from those libraries respectively as the highest accurate ones. Moreover, from Auto-Keras, it got 86.5% validation accuracy as an automated deep learning library. Table III summarizes the results got by the authors from all the algorithmic approaches.
According to the results summarized in Table III, it can clearly understand that the performance and the accuracy of the automated machine learning and deep learning libraries are low most of the time when comparing to the ML models made by the authors manually using the Scikit Learn and Keras. Moreover, the accuracy percentages for from TPot and HyperOpt-Sklearn libraries are also quite similar. However, the algorithms suggested by them were different. When building the automated models full control was given to the library for further pre-processing. As an example, the HyperOpt-Sklearn library did Scaling and did dimensional reductions using PCA for the datasets.  . 7 shows how the HyperPot-Sklearn library scaled the data before training the models.
Moreover, Fig. 8 shows how the HyperOpt Sklearn library uses dimension reduction using principal component analysis (PCA) for the spam message dataset. In addition, for all the four datasets, it can see the HyperOpt-Sklearn library suggests the Random Forest Classification algorithm with a preprocessing technique. However, according to the results, it can clearly understand in some cases these AutoML libraries are not able to capture the highest accurate ML algorithm.
So these are the results got in the evaluation process. In the next sector, it will discuss these results, what researchers could come up with, and proposed about the usage of the automated machine learning and deep learning libraries for sentiment analysis.

V. DISCUSSION
The main purpose of this research is to evaluate the sentiment analysis based on AutoML and Traditional codebased approaches. The implementation of the evaluation was done by using both machine learning and deep learning for multi-class sentiment analysis and binary sentiment analysis. The results gained from those implementations were discussed in the previous sector.
According to the results, researchers can propose several suggestions. The first thing is as students or beginner developers in ML and data science area using a traditional code-based approach is better when comparing to using the AutoML libraries. However, for binary classification models, do not see much risk of using AutoML libraries. Moreover, when concentrate on the results got from Auto-Keras, its accuracy percentages were lower than the traditional approach, and it will perform quite similarly to the traditional approach. Moreover, in the evaluation process, it was identified that the performance of the AutoML libraries are depending on the dataset as well.
Another thing proposed from this evaluation would be, knowing hyperparameter tuning, pre-processing, and knowledge about machine learning algorithms are musts for a beginner when implementing sentiment analysis models. As mentioned earlier in the implementation process authors built five main classification models and did several hyperparameter tunings for those. Moreover, pre-processing data is very useful when building sentiment analysis models. That is one reason why researchers were able to get high accuracy rates when comparing to the accuracy rates got from the AutoML models. In the automated libraries, it was noticed they perform several pre-process techniques such as Min-max Scaler and Principal Component Analysis (PCA).
Moreover, the authors proposed that if someone used the AutoML platform and got a model, he or she must do some tunings for that model and try some similar approaches for that. As an example when the Auto-Keras library returns a model it is good to change, the layers of that model and evaluate it. So these are the main suggestions that are proposed from this evaluation for those who try to use AutoML libraries for sentiment analysis. Moreover, these suggestions and evaluations would be useful in image classification as well.
In the implementation, process of the paper researchers did not face legal or social issues. When making the evaluations to improve the reliability of the findings authors used four data sets evaluated both multi-class classification and binary classification in sentiment analysis. Moreover, the authors evaluate both automated machine learning and deep learning libraries as well to improve the reliability and the validity of the evaluations.

VI. CONCLUSION
This research was done to evaluate the Sentiment analysis based on AutoML and Traditional code-based approaches. And finally, the research proposed several suggestions for anyone who is going to use AutoML libraries for sentiment analysis. In this research work, the following limitations were identified.
1) Choose only two main automated machine learning libraries, which are very popular.
2) For the investigation, the authors used four datasets.
3) There are code-free automated ml libraries as well. This research ignored those.
When concentrating on the future enhancements of this research author focused on following.
1) Evaluate the cloud-based AutoMLau methods such as Google AutoML.
3) Evaluate the performance of the AutoML and Traditional approaches in other sectors is machine learning as well. As an example Image classification, Time series analysis can be introduced. 4) Find a method to propose the most suitable algorithmic approach for the given project by the user by analyzing past ML projects.