Fake News Classification Web Service for Spanish News by using Artificial Neural Networks

—The use of digital media, such as social networks, has promoted the spreading of fake news on a large scale. Therefore, several Machine Learning techniques, such as artificial neural networks, have been used for fake news detection and classification. These techniques are widely used due to their learning capabilities. Besides, models based on artificial neural networks can be easily integrated into social media and websites to spot fake news early and avoid their propagation. Nevertheless, most fake news classification models are available only for English news, limiting the possibility of detecting fake news in other languages, such as Spanish. For this reason, this study proposes implementing a web service that integrates a deep learning model for the classification of fake news in Spanish. To determine the best model, the performance of several neural network architectures, including MLP, CNN, and LSTM, was evaluated using the F1 score., and LSTM using the F1 score. The LSTM architecture was the best, with an F1 score of 0.746. Finally, the efficiency of web service was evaluated, applying temporal behavior as a metric, resulting in an average response time of 1.08 seconds.


INTRODUCTION
In the last few years, the use of technology and digital media has increased. Besides, due to the pandemic, people have chosen to use digital media, such as social networks, to get news [1]. However, a large amount of unverified news could be fake [2]. Fake news has always existed, but there is a greater irruption in open and unrestricted access platforms nowadays. Therefore, the propagation of fake news can cause information theft, scams, collective hysteria, or discredit the honor of a person or an institution [3]. In order to solve the problem of fake news propagation, some techniques based on machine learning, Natural Language Processing, and information retrieval have been proposed to detect this kind of news automatically and make decisions [4]. Machine learning models have been used effectively in text classification to determine whether a text is racist, positive or negative, fake or genuine, and so on.
There are many classification models to determine whether the news is authentic or fake [5]- [10]. In [11], the authors propose a Naïve Bayes model for fake news classification with an accuracy of 84%. On the other hand, in [12], web scrapping and crawler techniques are applied in order to create a dataset with tweets and Facebook web links to train models. Since there are many machine learning models like decision tree, Support Vector Machine (SVM), Naïve Bayes, random forest, and Logistic Regression, some works have compared their performance in the task of news classification as fake or real. On the one hand, in [13], the comparison reveals that Naïve Bayes achieves the best performance on the tested datasets achieving an accuracy of 66%, and, in [14], a Random Forest model allows the classification of news into fake or real with an accuracy of 76.94%. On the other hand, in [15], the authors propose a neural network architecture that outperforms previous approaches with an accuracy of 94.21% on test data. Nonetheless, achieving these high-performance values is challenging for detecting fake news in Spanish due to the limited datasets in this language.
Most of the machines learning models aim to classify news in English. Therefore, this study presents a machine learning based framework to classify news in Spanish into real or fake. The web service with the classifier model was developed following the CRISP-DM and SCRUM methodologies. The main contributions of this study are listed as follows:  Machine learning model to classify fake news in Spanish. To the best of our knowledge, this model is the first one to classify news in Spanish by using MLP architecture.
 A web site based on web service architecture is built to access the classifier. That is, the machine learning model is integrated to the web service to be available in real-time.
The remainder of this paper is organized as follows. Section II describes previous studies related to fake news classification. Section III then presents the methodology applied in this study. Section IV describes the classification model with the integration into the web service. Section V presents the experimental results and evaluation metrics of the proposed classifier, while Section VI presents the conclusion of this work.

II. RELATED WORK
Since fake news can be harmful to individuals or organizations, effective ways to detect this kind of news have been developed through the years. The problem can be framed as a two-class (real/fake) classification. Therefore, this study reviews approaches for fake news classification based on machine learning classifiers, such as decision trees, support vector machines (SVM), and Naïve Bayes. In [16] , the researchers propose a fake news detection system that uses the www.ijacsa.thesai.org decision tree algorithm to classify the news from two sources. Then, they compare the results against the result obtained with the SVM algorithm showing that the results obtained with the decision tree are more accurate than with SVM, with an accuracy of 97.67% and a precision of 94.60% against the SVM results of 91.74% in accuracy and 90.12% in precision.
On the other hand, the work done in [17] compares the SVM with an apriori algorithm performance on a dataset that contains four attributes and 311 instances. In this case, the results show that SVM achieves a better accuracy of 91.87% while the apriori algorithm only gets an accuracy of 31.76%. It means it is better to project data into a hyperplane to look for discriminative features in this projected space rather than look for them directly in the source data space [18]- [20]. In [21], the authors apply a machine learning algorithm and use Natural Language Processing (NLP) to pre-process the data to increase the accuracy of the machine learning algorithms. The researchers pre-process a dataset from the Kaggle website of 20,800 news articles containing 10,387 real news and 10,413 fake news. The pre-processing consists of tokenization, removing stop words, stemming, and vectorizer through term frequency-inverse document frequency (TF-IDF). Then the pre-processed data is used to feed six ML algorithms: logistic regression, Naïve Bayes, K-Nearest Neighbor (KNN), SVM, random forest, and decision tree. They use the K-fold crossvalidation, confusion matrix, and classification report (precision, recall, and F1-score) for model evaluation. Thus, after comparing the various models, the accuracy score is used to determine the best one giving the result that random forest and decision tree models achieve the best performance with over 99% accuracy. In comparison, the KNN got the worst accuracy, with ~52% accuracy, and the other models' accuracies were over 96%. The results indicate that NLP contributes to models' training improving their accuracy.
Helmstetter et al. [22] use a dataset collected from Twitter. From the dataset, they use the text features as well as the metadata from Twitter, which corresponds to user-level features (e.g., number of followers) and tweet-level features (e.g., number of retweets). Moreover, they extract additional features, such as the sentiment features using SentiWordNet [23] and the topic features using Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) models. Then, the features are scaled and selected in order to feed the learning algorithms that include some ML algorithms; however, the best performance is achieved by the ensemble method XG-Boost which achieves a 0.94 F1-score. Another approach, like the one presented in [24], also proposes a voting ensemble mechanism with three classifiers: Decision Trees, Naïve Bayes, and KNN to achieve a lower error rate than using the models separately. Besides, Wynne et al. [25] proposed a large ensemble model called the two-layer ensemble model, where there is a first layer that contains two sets of voting classifiers with five classifiers each, and the output of these two voting classifiers are used as input for the second layers which contains a third final voting classifier. In this case, the final voting classifier performs best on the LIAR dataset. However, this approach can be time-consuming and needs a good number of resources since one needs to train each of the ten classifiers. Furthermore, Tian et al. [26] use a feature selection method called Genetic and Evolutionary Feature Selection (GEFES) [27] to identify a subset of features by means of a steady-state genetic algorithm. The selected features are used as input for a KNN model to classify news as fake or genuine, improving the model's performance. For experiments, the authors use the dataset BuzzFace which consists of 2282 news from Facebook related to the 2016 US presidential elections. On this dataset, the authors achieved an accuracy of 91.3% for the classification task using the proposed approach.
Some researchers have proposed methods that work with news in other languages different than English. For instance, Billones et al. [28] train ML models with a dataset of news in Filipino and compare the performance of the models with models trained with English news. The results show that the models perform better on the English dataset than on the other language dataset due to the limitation of labeled data amount in other languages. Khalil et al. [29] tackle the problem of few data in other languages creating a large Arabic fake news corpus of 606912 articles. Then they train and test some deep learning algorithms getting the best accuracy of 78.3% with the capsule network [30] that is based on a convolutional neural network. Additionally, many other works use deep learning strategies for English news classification. For instance, Alameri et al. [31] use neural networks and Long Short-Term Memory (LSTM) networks and compare them with classic ML approaches giving as the result that the LSTM outperforms the other models in terms of accuracy, precision, recall, and F1score. Gupta et al. [32] also use deep learning approaches that include Convolutional Neural Networks (CNN) and LSTM, which are tested individually and, after, are used as an ensemble model to get the probability of truth. In this case, individually, CNN gets better results than LSTM. Mahmud et al. [33] use Graph Neural Networks (GNN) to make predictions based on text and graph data where the text is extracted using text representation learning techniques, and the graph contains the news propagation data. The addition of graph data boosts the performance of the model.
The literature review highlights the importance of deep learning models in the task of news classification and emphasizes the need for datasets in languages other than English, such as Spanish. Therefore, this study proposes conducting a comparative analysis of machine learning models and deep learning models for detecting fake news in Spanish to develop a tool that implements the best method to determine the authenticity of news in real-time.

III. METHODOLOGY
This section describes the applied methodologies for creating the fake news classification web service. CRISP-DM was applied to implement the fake news classifier model, while SCRUM was used to develop the web service (see Fig. 1). www.ijacsa.thesai.org

A. CRISP-DM Methodology
In order to develop the fake news classifier based on deep learning techniques, CRISP-DM Methodology is applied. This methodology consists of six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.
The phases are summarized in the following points:  Business Understanding: In this phase, the fake news problem is analyzed. To this end, the state-of-the-art frameworks for fake news classification are reviewed, allowing to determine that there are few fake news classifiers in Spanish since most existing approaches focus on English news. Thus, this study aims to develop a fake news classifier for Spanish news.
 Data Understanding: Several datasets were analyzed to create a suitable dataset in Spanish for training the model. The final dataset consists of news extracted from the following sources: Fake-news-in-Spanish 1 , Fake and real news 2, 3 and Fake news Corpus Spanish 4 . Besides, news from a website with fake Ecuadorian news was extracted to complement the dataset. The dataset consists of 9294 news and the corresponding labels: 4081 fake news and 5213 real news.
 Data Preparation: In this phase, the data are integrated from multiple sources using Python. Each news is preprocessed by removing stop words; that is, the dataset does not contain words such as "the", "is", "are", etc. Then, stemming is applied to each word to extract the base form of the word. For instance, the stemming of the word "eating" is "eat". Finally, tokenization is applied for each news.
 Evaluation: The models based on MLP, CNN and LSTM are evaluated by using F1 score. According to the results of this quantitative evaluation, the LTSM model was the model with the best F1 score for classifying Spanish news as fake or real.
 Deployment: In this phase, the classifier was integrated with a web service to be available online.

B. SCRUM
In order to develop a web service that allows the execution of the fake news classifier, SCRUM methodology is followed. This methodology mainly consists of:  Product Backlog, where is defined all the system requirements.
 Sprint Backlog refers to the requirements to be developed in the spring. They are defined during the spring planning meeting.
 Sprint is the period that the Scrum Team (developers) uses to complete the development of the requirements within the Sprint Backlog. After a Sprint is completed, there are the Sprint Review and the Sprint Retrospective. Each sprint had duration of 50 hours.
The architectural design SOA (Service Oriented Architecture) is defined for the system and a user interface is designed (see Fig. 2). Then, the system that integrates the fake news classifier model was developed using the Flask framework. In addition, HTML, CSS, and JavaScript are used for the front end.

IV. THE PROPOSED METHOD
This section introduces the fake news classifier integrated into a web service available through the Internet. The overview of the proposed solution is shown in Fig. 3. www.ijacsa.thesai.org The proposal includes a binary classifier that indicates whether the news is fake or real. The model's input is preprocessed text by removing stopwords, applying stemming, and tokenization. On the other hand, the classifier model is based on an LSTM architecture, and different hyperparameters were evaluated to select the best model configuration. Fig. 4 summarizes the final architecture of the model.

V. RESULTS
This section presents the results of the evaluation of the proposed model. As aforementioned, there was a comparison of several shallow models with deep learning models, specifically with artificial neural networks (ANN). Since ANNs had a better performance, three ANN architectures were evaluated: MLP, CNN, and LSTM to select the best model for the proposed solution.
The evaluation metric used to evaluate the model performance was the F1 score, whose formula is: (1)   Table I shows the results of the models' evaluation. LSTM achieves the best performance with an F1 score of 0.746, followed by MLP and CNN with an F1 score of 0.738 and 0,678, respectively. Since LSTM is the best model, the developed web service integrates this model.
On one hand, the CNNs performance is poorly against MLP and LSTM since it is more oriented to work with images than with text. On the other hand, LSTM networks perform better than others since these kinds of networks are capable of learning sequences that allow a classification based on the context of the whole text.  Afterwards, the obtained results are compared with some previous work that have trained machine learning and deep learning models for fake news detection with datasets containing news in Spanish. Table II presents the F1 Scores obtained by six methods on the task of fake news classification in Spanish. The results show that the proposed method achieves the best performance while the random forest approach gets the worst performance. Additionally, it should be noted that there are few works that use a Spanish corpus and in the existing works the used corpus are small containing at most a set of 1500 news against the set of 9294 news that was used in the current research.
Furthermore, the performance of the web service was evaluated to determine the time required for news classification. The web service was deployed on Microsoft Azure, and its time response was considered as a metric, starting from the moment the user sends the request to the web www.ijacsa.thesai.org service until it returns a classification response. Moreover, three scenarios were defined based on the length of the news content, which are described in Table III.   TABLE II Each scenario consists of 1386 requests to the web service for classifying news. Table IV summarizes the results of time response (seconds) after executing all requests for each scenario with news content of small, medium, and large lengths. The mean response time of the web service is 1.08 seconds, indicating good performance according to [39], as websites with good time behavior must respond within a maximum of 10 seconds before users leave the site. The study proposes a Fake Spanish News Classifier trained in a supervised fashion, utilizing LSTM architecture to determine the authenticity of Spanish news. The selection of LSTM architecture is based on its superior performance compared to MLP and CNN architectures since this architecture allows having a memory state, maintaining a relationship between the sequences of words in the text, which implies better performance when making the classifications. The method was evaluated using both private and public datasets, with experiments showing an F1 score of 0.746 which is good for a dataset with limited news in Spanish. However, this value is lower than the values in other studies that use datasets in English, noticing that the predictions in languages different than English are challenging mainly due to the limitation of data. Besides, a web service integrating the classifier has been implemented to detect fake news in real-time. For future work, the news can be classified into additional labels such as satire or junk science, for a more comprehensive understanding of fake news available on the internet that allows detecting them prior to propagation.