Artificial Intelligence based Recommendation System for Analyzing Social Bussiness Reviews

Recently, analysing reviews presented by clients to products that are provided by e-commerce companies, such as Amazon, to produce efficient recommendations has been receiving a lot of attention. However, ensuring and generating effective recommendations on time is a challenge. This research paper proposes an artificial intelligence-based system. The proposed system uses the Incremental Learning based Method (ILbM) to learn a neural network classifier. The ILbM uses the bagging technique in the process of training the classifier. To ensure a high degree of performance, the ILbM is implemented on the Hadoop since it allows the execution in parallel. Compared to a similar system, the proposed system shows better results in terms of accuracy (97.5%), precision (95.7%), recall (91.5%), and time of response (36 seconds). Keywords—ILbM; reviews; classifier; text analysing; training bagging; MapReduce; big data


I. INTRODUCTION
As technology continues to evolve in most areas of life, data is being generated daily and growing rapidly. Therefore, we are in the big data era in which data sets are too large and complex.

A. Big Data and Its Properties
To consider data as big data it must contain some characteristics including; volume, variety and velocity [1]. A summarization of these characteristics is illustrated in Fig. 1.

B. Big Data and Its Relationship with Business Intelligence and Analytics
Business intelligence and analytics (BI&A) and the field of big data analytics have become increasingly discussed and studied in both the academic and the business communities over the past two decades. Industry studies have highlighted this significant development [2]. As a strong proof of this, Amazon Company is considered as an important source for big data related to transaction details or related to the customers. In addition, Amazon Company has employed Artificial Intelligence (AI) to analyze the data for increasing gain purpose [3]. The big data generated by the systems and used in Amazon (or any other online shopping companies) forms the base of applying BI&A to improve the work of companies in terms of artificial intelligence.

C. Problem Statement
Since financial transactions are considered sensitive in companies, employing artificial intelligence to enhance the amount of gain is critical. The reason behind this is that when using wrong (or inaccurate) BI&A on the data base, this leads to poor or wrong decisions' making, and consequently leads to financial losses. This problem is strongly highlighted when it comes to analyse and mine text. Fig. 2 illustrates the problem of having poor BI&A.
As shown in Fig. 2, the failure in the implementation step requires returning to the early steps. This in turn reflects poor accuracy, which is the main issue in this domain. 122 | P a g e www.ijacsa.thesai.org Besides the accuracy issue, providing decisions in the correct time is also critical. In other words, making decisions (such as providing good recommendations about a certain product for clients) taking into account the time dimension is a sensitive issue. That is because a delay in making the decision and providing it to the customer leads to change his\her mind and may direct him\her to another competitor [5]. For this reason, high performance is pressing in business domain when depending on artificial intelligence. On the other hand, dealing with big data in the business domain requires solving the performance issue [6].

D. Research Questions
Accuracy and performance issues explained above lead to setting and answering the following research questions: 1) How can artificial intelligence-based business systems provide satisfied decisions to customers in terms of accuracy?
2) How can artificial intelligence-based business systems provide satisfactory decisions on time, taking into consideration dealing with and analyzing big data effectively?

E. Contribution
In general, the contributions of this paper can be presented as follows: • In responding to the first research question, this paper proposes a novel artificial intelligence system to provide accurate decisions. The system uses a novel algorithm called Incremental Learning based Method (ILbM). The key idea of the ILbM algorithm depends on incremental learning using the bagging method.
• For answering the second research questions, this paper employs Hadoop to perform the training phase in parallel [53]. This ensures fast time decision making in regard about providing recommendations based on using MapReduce technique.
• Extensive experiments are conducted to show the effectiveness of the proposed methods and compare them with similar approaches in term of Performancebased metric discussion and AI-based metrics discussion.

F. Paper Organization
The rest of this work is organized as follows. Section II reviews the related research works. Section III provides the proposed artificial intelligence system. In Section IV, the used metrics are presented for evaluation purposes. Section V presents the experiments and analyses and discusses the results in light of a comparison with similar approaches. Finally, the conclusion of the paper is discussed in Section VI.

II. RELATED WORK
This section provides a brief background regarding the domain we stand on, and it is followed by a review of approaches that have previously been proposed in analysing and predicting big data in social networks research field under the AI umbrella.

A. Background about Big Data
The development of information technology has led to the generation of huge data from various sources especially the social media and IoT sensors sources that is growing exponentially every day. Data created in social media networks vary from texts, images, videos, maps, and sounds. These types of data are divided into structured and unstructured data where the relationship between friends is structured data that can be placed in a specific structure, and text is considered the unstructured data [7]. Big data analysis is used primarily to handle large-scale data generated very quickly with a variety of sources and formats. It's very convenient to face challenges that relational databases cannot. Techniques and methods for analysing this type of data have been studied in many publications, such as what was studied in [8].
Analysing big data leads to what is called 'big impact' according to the research work [9]. In this context, emerging analytics research opportunities are born. They can be classified into five critical technical areas, which include data analytics, text analytics, web analytics, network analytics, and mobile analytics.

B. Social Media and Big Data
Social media big data is primarily generated from the available users' behavior data that express browsing histories, what they buy, and all the information they are accessing online. The research in [10] provides a comprehensive view of how to manage the knowledge in social media big data. There are some common problems in applying big data techniques in social media data analysis [11]. These problems are data-driven versus theory -driven approaches, measurement validity, multi-level longitudinal analysis, and data integration.
Many studies have focused on big data analysis techniques with social media data [12][13][14]. Due to this focus and in terms of organization, big data analytics on social media is categorized into different classes to grasp their characteristics [15].

C. Machine Learning and Social Media Big Data
In the context of big data analysis, one must choose the appropriate method. Machine learning methods today are extensively used to analyze big data. The machine learning concept can generally be defined as "the computer software learns from the experience E of a particular task T with a P performance metric, if its performance for tasks T is improved by using measure P through experiment E [16]. T represents the problem to be addressed and resolved, such as classification problem. P is the performance metric that is related to the type of Task T to solve. While experience E represents the data used to conduct the learning process.
Machine learning is a subset of artificial intelligence that can modify itself by learning without any human intervention [17].
Different machine learning methods to analyze big data are discussed in detail in [18]. It compared many machine learning methods and found that to analyze big data, neural networks have advantages over other methods. A discussion about the challenges of learning with big data and the corresponding 123 | P a g e www.ijacsa.thesai.org possible solutions in recent researches was given in [19]. It states that, for big data, there is no single machine learning method that can be generalized. This is because we need different machine learning methods for different data. In addition, an analysis of the machine learning methods for big data is given in [20]. It emphasizes on the new characteristics of machine learning used to analyze big data by proposing a machine learning framework and to analyze big data based on parallel computing and distributed storage.

D. Deep Learning
In this section, the researchers explain the main concept of deep networks. In addition to that, different deep learning network architecture will be examined. The focus will be on the architecture that suitable for the research. Neurons and layers in deep learning network are significantly greater than in traditional neural networks. This increase in neurons and layers supports solving problems that are more complex. Moreover, there are developments in how layers communicate with each other and support for automatic feature extraction. Based on this principle, deep learning is now frequently used to analyze big data.
Various studies have applied deep learning to analyze social media big data. There are many applications using deep learning such as anticipating the behavior of users via social networks, sentiment analysis of text and business analysis. The research [21] concentrated on twitter data; it investigated the behavior of shared learning via Twitter data by examining the behavior of subscribers. Traditional learning methods used to analyze social networking data in crisis were criticized in [22] and deep learning has been tried to overcome these difficulties, since it proposes a new approach to improve the social media analysis incises situations.
Analyzing sentiment by using social media data is an important process. A framework in [23] for sentiment analysis was developed to build a self-developed military sentiment dictionary using deep learning. A real-time syndromic surveillance system is introduced in [24]. This system relies on Twitter social-media data analysis in order to be aware of a syndromic surveillance. Deep learning is used in this system to analyze data and determine the prevalence of a disease. Lampos and Cristianini are used in [25] social network users as sensors to predict real-time events such as rainfall prediction.
Furthermore, the prediction of earthquakes by the same idea is developed in [26] by using deep learning. Social media post analysis is introduced in [27] to extract various meaning information to predict any events. In addition, deep learning is used to make classifications and recommendation accurately. A massive amount of posts was analyzed in [28]. It studies on hashtag, retweet, characters in each tweet, and so on to classify users by their ages. It uses Deep Convolutional Neural Network after displaying the results of some traditional machine learning techniques like SVM. The impact of the recommendation and the adaptation process has become very large today, especially data on social media are increasingly growing. For instance, Multi-view Deep Neural Networking [29] was proposed to make recommendations on computer games that one can play based on data from Sony and Microsoft.
All these previous studies have appropriately contributed to the analysis of social media data, but one can see that there is still a gap and the research presented in this paper works to bridge this gap. The business community wants to make the most of social media data, and it requests that this benefit be translated into recommendations and plans through an integrated system. Therefore, the present research concludes that the design of an integrated system which is used by the business community to improve sales and advertising campaigns and inventory management is one of the important and required actions todays. This research also discusses three deep neural networks that can be used extensively because they can handle huge data. They involve Deep stacking network, long short-term memory and Deep Boltzmann machines.

E. Deep Stacking Network (DSN)
Layers in this type of deep learning architecture are arranged in a hierarchical form as illustrated in Fig. 3 [30]. This deep learning type supports parallel learning, where the training of each module is held in isolation from the other blocks. In Fig. 3, there are only four modules, but hundreds of these modules are supposed to exist according to the type of the problem to be solved. DSN is mainly used to implement huge amount of data like social network data. The DSN has been developed and extended to the Tensor Deep Stacking Network (T-DSN) [31]. They are similar to a DSN but are different by adding some new layers such as a hidden layer. One advantage of DSN is that the lower module output combines with higher module inputs to be the final input for this higher unit. Applications designed based on DSN for data retrieval are described in the research [32].

F. Deep Boltzmann Machines (DBM)
Deep Boltzmann Machine (DBM) has several hidden layers [33]. Geoff Hinton defines the general Boltzmann machines as follows: "A network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off." DBM is an undirected model, it is used for data regression and time series. The hidden variables in DBM are used to learn the probability distribution of the input data. In another way, a network prevents any communication between nodes at the 124 | P a g e www.ijacsa.thesai.org same layer in Restricted Boltzmann Machines (RBM) [34] as illustrated in Fig. 4. In research [35], regression model was built using DBM to complete shapes. This network is successful in analysing heterogeneous data.

G. Long Short-Term Memory (LSTM)
Recurrent Neural Network is used for many purposes like speech and voice recognition, time series prediction, and natural language processing. This network is used to process sequential data such as that used to implement the autocomplete feature of words that predicts the rest of the words. Long short-term memory (LSTM) is an extension of Recurrent Neural Network [36], where it can consider much longer input sequences and overcome some obstacles. The structure of the LSTM is illustrated in Fig. 5.  The researchers in [37] proposed a system built on the reviews provided by Amazon Company. The system used neural network with Support Vector Machine (SVM) as a classifier and the accuracy achieved in this work is 81 %. The researchers of the current research call this system NN-SVM system for short. It is worth mentioning that this system will be compared later with the system proposed in this paper.

III. PROPOSED INTELLIGENT SYSTEM FOR SOCIAL MEDIA ANALYSIS
The proposed framework in Fig. 6 consists of a data base (data set), a number of users, and the proposed Intelligent Social Media Analysis System (ISMAS).

A. Data Set
The data base is used to train and test the ISMAS. The data used in this study is a set of approximately 3.5 million product reviews collected from Amazon.com by Fang et al. [38]. Thousands of Amazon product reviews were obtained for use in training and testing the deep learning model. A large group of reviews (28000) was collected in order to be classified according their ratings; each review consists of 40 words long. We adopted this length because most of these reviews fall within the scope of this length. These reviews have been labeled with ratings from 1 to 5. The resulted data are divided into two parts, the first part is used to train the model and its subset is 80% of the dataset, and the remaining part is used to test the accuracy of the model after its training. The ratings 1 and 2 are considered negative review, while the ratings 3, 4, and 5 are considered a positive review. Fig. 7 displays a review of the dataset.
Training and test data texts are extracted with their attached ratings, which are considered to their labels as illustrated in Fig. 8. The text of the reviews is divided into words and some preliminary processes are performed to prepare texts for the analysis process.

B. Proposed Intelligent Social Media Analysis (ISMAS)
The ISMAS consists of three main components. Table I summarizes the components, the task of each component, and the place where they are installed.

Component Name Task
Text Analyzer Text Analyzing.
Determining degree of acceptance.
Making recommendations. Fig. 9 illustrates the architecture of the proposed ISMAS.

1) Text analyzer:
In general, texts are divided into tokens, for example, the sentence "training and test data" are divided into 'training' -'and' -'test' -'data' tokens. Then, for these tokens, the researchers add Part-Of-Speech, for example adjective, noun, verb, etc. The lemmatization process is applied to reduce each token to their root form, for example "building" word can be reduced to "build", also "words" transformed to "word". The punctuation can be erased to adjust the accuracy. Again, words like "and", "to" may add some noise to the texts, so the researchers need to delete those words before analyzing the product reviews. Besides, they have deleted any word that consists of two letters or less.
Upon this, the text analyzer performs a kind of text mining, where the last step of this process is extracting the knowledge (which is achieved by the deep learning model). Fig. 10 illustrates the three main steps of text mining [39].
In details, step 1 establishes the corpus, whereby all relevant unstructured data are collected. Then, the researchers digitize and standardize the collection. Finally, they place the collection in a common place (a directory consists of separate files).
In step 2, the Term-by-Document Matrix (TDM) is created. All terms are included in the TDM, such as stop words, synonyms, homonyms, and stemming.
In step 3, patterns/knowledge is extracted relying on deep learning model.

2) Deep learning model:
Here, the intelligent model (ANN) is trained on the data that are analysed and stored in the TDM. This component performs the ILbM to determine the acceptance degree related to a certain product. The ILbM relies on the bagging technique to enable incremental learning, as shown in Fig. 11.  As shown in Fig. 11, there are six steps in the proposed model, starting with filling a data loop and finishing with the final prediction step. Below is a description of each step.

a) Filling data loop:
Here, a buffer is used as a temporal data storage and it is linked with the training data set. At the beginning the buffer is empty and gradually is filled with data. The data is supplied to the buffer from the dataset. The loop continues until the buffer is filled. The size of the buffer ( ) is calculated according to the following formula: Where denotes the size of the original training data set. When this condition is satisfied, the next step is executed.

b) Creation a bag:
The bag represents a small training dataset that contains a part of the original dataset. The created current bag is used to train the first classifier (in the fourth step). When the buffer is filled again in the next iteration, a new bag is created. c) Old data deletion: This step refers to empty the buffer. In other words, the old data used to create the current bag is deleted and a new data is supplied. This process is called data refresh. This process gives an advantage of enabling the bags in parallel as it is described later when it comes to enhancing the performance of the proposed system in terms of response time. d) Training artificial neural network: In this step, the first classifier is trained on the data contained in the first bag (current one). The training process depends on two main stages, which are (1) extracting the features of the analysed text generated by the text analyser component; and (2) pooling the extracted features to draw a deep description of a product. The 126 | P a g e www.ijacsa.thesai.org extracted features are mainly inspired by strong and weak words included in the review of the products. Fig. 12 illustrates how to extract features.
In Fig. 12, the features are represented by a vector that contains highlighted words taken from the review of a product. The words included in the vector are taken from the review provided by Fig. 8. The words that have a positive trend are considered strong features that support the product, while the words that have negative trend are considered weak features. In Fig. 12, the words 'Good and Great' are strong features while the words 'Very and Dry' are weak features. Relying on the vector of features, the ANN model (classifier) is trained.
After finishing the extraction stage, the pooling stage starts. In the pooling stage, both the strong and weak features are formed to construct a sentence. The sentence is used to train the ANN to give an output (i.e., the class of the sentence to be positive or negative). The training process lasts until no reviews are found in the current bag. Consequently, the output of training ANN step is a trained classifier that can predict the class of a given review. Fig. 13 illustrates the structure of the ANN.
To generate the outputs, an activation function is used. In this work, the Softmax function is employed since it has a valuable feature called multiclass. In other words, the Softmax function has the property of handling more than two classes [40]. Therefore, the two classes (positive and negative) can be generated as a numeric data. Visually, the Softmax activation function is illustrated in Fig. 14.
The Softmax function is attached before the output layer, where positive class = 1 and negative class = -1, as shown in Fig. 15.    e) Combination: Based on formula 1, there will be 10 bags. Each bag is used to train an ANN classifier and consequently there will be 10 classifiers. Since each classifier generates a result, a combination process is performed to generate the result (prediction). Fig. 16 illustrates the combination of the result of each classifier.
It is worth mentioning that each classifier takes the knowledge learned by the previous one to be added as new knowledge. For example, classifier 1 learns by adding to the knowledge of classifier 2. Thus, the final classifier has the incremental knowledge learned by all the nine previous classifiers. 127 | P a g e www.ijacsa.thesai.org f) Output final prediction: In this step, the final prediction generated by the trained ANN model is shown as an input for the decision-maker component. This step includes marking the review by positive or negative for further manipulation by the evaluator.
3) Decision maker: This component is responsible for providing some recommendations. These recommendations fall in the business domain and are related to make some adaptation according to the situation that is currently dominating the market. In the context of the business domain, adaptation refers to a process that targets the most amount of gain. This can be achieved by providing some recommendations related to stop generating or supplying the market with more items of a specific product (the product that is predicted to have a positive or negative trend). Since there are many products under review each time, the decision maker groups the positive products in a cluster and does the same for the negative products. This facilitates to provide one recommendation to all positive products and all negative ones. Fig. 17 shows the clustering of products.

C. Employing Hadoop for Better Performance
Hadoop is an open-source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers [41]. It's at the center of an ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning. Hadoop systems can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing, analyzing and managing data than relational databases and data warehouses provide [42]. It is worthwhile to mention that the data supplied to the Hadoop platform should follow the security and privacy agreements. Security and privacy issues are highly highlighted according to many researches [43][44][45][46][47][48][49][50][51][52]. In this work, privacy and security of data are considered out of scope, where they will be considered in future work.
Hadoop allows parallel execution of jobs. This means that the job that is intended to be executed on Hadoop must be paralyzed. In this work, the job that is executed in parallel on Hadoop is the training phase of the model. That is because each bag is used to train one classifier, the number of bags was determined in proportion to the data in the training and consequently each classifier can be trained separately in a single thread. Fig. 18 illustrates using Hadoop to train the classifiers using threads.
Hadoop uses MapReduce technique to perform parallel execution. The major advantage of MapReduce is that it is easy to scale data processing over multiple computing nodes [53]. The MapReduce technique contains two important tasks, namely Map and Reduce. The main mission of the Map is to divide the job into parts and then distribute the parts over computational nodes (servers) for execution. The main mission of the Reduce is to collect the results of the executed parts and return them to the master node. Fig. 19 illustrates the mechanism of MapReduce technique used for training the classifier.
As shown in Fig. 19, there are 10 computation nodes. Each bag is loaded on a node (i.e., the training data set used to train a classifier). After processing (i.e., training the ANN on each node), the results are gathered by the Reduce to return them back to the master node. Here, the results are represented by the rules that the classifiers are learned. At the master node side, all the rules are combined to form the final trained classifier.   IV. USED METRICS Two types of metrics are presented for use in the evaluation process. They are AI-based metrics, and performance-based metrics.

A. AI-based Metrics
Basically, the confusion matrix (CoMa) is an effective benchmark for analyzing how well a classifier can recognize the images of different classes. The CoMa is formed considering the following terms [41]:  Relying on the CoMa, the accuracy (Acc), precision (Pre), and recall (Rec) metrics are driven. For a given classifier, the accuracy can be calculated by considering the recognition rate, which is the percentage of the test set images that are correctly classified. The accuracy is defined as [41]: Accuracy-based evaluation. In this context, a higher accuracy corresponds to a better classifier output. The maximum value of the accuracy metric is 1 (or 100%), which is achieved when the classifier classifies the images correctly without any error during the classification process.
For precision, it refers to the exactness (what % of tuples that the classifier labelled as positive that are actually positive). It is given by [41]: Precision-based evaluation. In this context, a higher precision corresponds to a better classifier output. The maximum value of the precision metric is 1 (or 100%), which is achieved when FP=0.
Recall refers to the completeness (what % of positive tuples did the classifier label as positive?). It is given by [41]: Recall-based evaluation. In this context, a higher recall corresponds to a better classifier output. The maximum value of the recall metric is 1 (or 100%), which is achieved when FN=0.

B. Performance-based Metrics
Time of response (ToR) is used to evaluate approaches in terms of performance. The ToR is calculated based on the total time of the four main steps that are illustrated in Fig. 12 (i.e. fill data loop, bag creation, ANN training, combination, and output steps). ToR is given as: Where . . . . denote the time consumed by the fill data loop, bag creation, ANN training, combination, and output steps, respectively.
It is important to notice that the shorter the ToR is, the better the performance of this approach is.

V. EXPERIMENTAL RESULTS AND EVALUATIONS
This section is arranged so that it first presents the setup and the approach that is intended to be compared. Then, the actual results and the corresponding discussions are provided.

A. Setup
The proposed AI-based system is implemented on a laptop that has the specifications shown in Table III.
The researchers selected one system (NN-SVM system), mentioned in the related work section, to compare with the proposed ISMAS system. It is illustrated in Table IV. 129 | P a g e www.ijacsa.thesai.org

B. Results and Discussion
The results are provided according to both the AI-based metrics as well as the performance metric.

1) Results depending on the AI-based metrics:
Since there are 10 ANN classifiers due to using 10 bags, the results of the AI-based metrics are arranged in Table V.  The reason for these enhancement of the AI-based metrics is related to the incremental learning of the classifier. In other words, the cumulative knowledge gained by incremental learning is more efficient when compared to one learning iteration in the NN-SVM system.

2) Results depending on the performance-based metric:
In the context of this evaluation, the ToR is calculated for each classifier, and then the average time is obtained. Fig. 20 shows the results.   130 | P a g e www.ijacsa.thesai.org Performance-based metric discussion. It is obvious from Fig. 21 that the proposed ISMAS system outperformed the NN-SVM system. The speed of the performance of the proposed ISMAS can be calculated as: This means that the proposed ISMAS system has more than double speed when compared to the NN-SVM system. The reason behind the enhancement of the performance of the proposed ISMAS system is employing Hadoop to perform the training in parallel. Theoretically, the ToR of the proposed system should be 10 times lesser than the ToR of the NN-SVM system. However, the procedure of creating Mappers to distribute the job over the nodes as well as the time of creating reducers and combining the sub-results to form the final results is the reason why the ToR of the proposed system is about 2 times lesser than the NN-SVM system.

VI. CONCLUSION
Recently, analysing reviews presented by clients to products that are provided by e-commerce companies, such as Amazon, to produce efficient recommendations has received a wide attention and is considered as a vital research in the business field. In this paper, the researchers propose an artificial intelligence recommendation system for analysing reviews in the business area. The proposed system consists of three main components, which are text analyser, deep learning model, and decision-maker. The text analyser component uses the Term-by-Document Matrix (TDM) in the process of analysing the reviews. The reviews are obtained from a data base provided and available by Amazon e-commerce Company. The deep learning model (classifier) is trained based on the bagging technique. Ten bags are created with a data inspired from the data base and ten classifiers are trained. The knowledge (represented by the rules of classifying the reviews) that is gained by the fist classifier is added to the next one until the 10 th classifier. In regard to enhancing the response time (the performance), the ten classifiers are trained in parallel using the Hadoop platform depending on the MapReduce technique. The decision-making component clusters the final results of the classifiers to generate an efficient recommendation for better gain purpose. The proposed system is evaluated based on two kinds of metrics, which are the artificial intelligence metrics (accuracy, precision, and recall) and the performance metrics (time of response). Compared to the NN-SVM similar system, the proposed system showed better results as follows: accuracy (96.7), precision (95.1), recall (91.5), and time of response (36 seconds).
In terms of limitations, the proposed system did not consider the sentiment analysis that may be found in the reviews provided by the customers.
In future work, the researchers intend to take into account the sentiment analysis to obtain a higher accuracy percentage. In addition, they intend to apply the proposed system on a real dataset in real time taking into consideration privacy and security issues.