Single and Ensemble Classification for Predicting User’s Restaurant Preference

Classification is one of the most attractive and powerful data mining functionalities. Classification algorithms are applied to real-world problems to produce intelligent prediction models. Two main categories of classification algorithms can be adopted for generating prediction models: Single and Ensemble classification algorithms. In this paper, both categories are utilized to generate a novel prediction model to predict restaurant category preferences. More specifically, the central idea espoused in this paper is to construct an effective prediction model, using Single and Ensemble classification algorithms, to assist people to determine the best relevant place to go based on their demographic data, income level and place preferences. Therefore, this paper introduces a new application of classification task. According to the reported experimental results, an effective Restaurant Category Preferences Prediction Model (RCPPM) could be generated using classification algorithms. In addition, Bagging Homogeneous Ensemble classification produced the most effective RCPPM. Keywords—Classification; data mining; ensemble algorithms; restaurant preferences


I. INTRODUCTION
With the increasing accessibility of innumerable data collections, the extraction of interesting patterns from such data becomes a necessity. Data mining involves extracting interesting and helpful patterns from enormous amount of data [1]. Classification is a well-known data mining functionality that refers to the process of generating a prediction model and using it to predict categories for new unseen samples. More specifically, classification can be considered as a three-step process. The first step commences with generating the prediction model using the "training" dataset that comprises a set of samples, where each sample is associated with a categorical class label. The classification problems can be differentiated according to: (i) the number of class labels featured in the dataset and (ii) the number of the class labels associated with each sample in the dataset. With respect to the number of labels featured in the dataset two kinds of classification problems can be recognized: binary and multi-class classification problems. In binary classification problems, the considered dataset includes only two labels, while more than two labels featured in the multi-class classification problems. Regarding the number of labels associated with each sample in the dataset, also two types of classification problems can be distinguished: single-label and multi-label classification problems. When each sample in the dataset is associated with exactly one label then we have a single-label classification process. Whilst, if several labels can be associated with one sample then we have a multi-label classification process. Several classification algorithms can be utilized to produce the prediction model for each classification problem. After generating the prediction model, the next step is the evaluation in which the performance of the generated prediction model is assessed to determine its applicability to be used for predicting class labels for new samples. Several measures can be used to evaluate prediction models effectiveness; accuracy and Area Under the ROC Curve (AUC) are the most widely used measures [2], [3]. Based on the values obtained from evaluation measures, a decision can be drawn regarding whether or not to utilize the model for future prediction. The last step in the classification process is the model usage, where the prediction model is utilized to predict class labels for new unseen data. Classification has been employed in many application domains, examples of application domains include: text categorization [4], bioinformatics [5], manufacturing [6], e-learning evaluation system [7], medical diagnosis [8], data management [9], music categorization [10] and movie genre prediction [11]. Among these music categorization and movie genre predictions or genre preferences prediction [12], [13] could be considered as entertainment applications of classification. To the best of our knowledge, no previous work utilized classification algorithms for predicting restaurant category preferences.
In this paper, a novel application of classification is introduced. Classification algorithms are utilized to generate Restaurant Category Preferences Prediction Model (RCPPM). RCPPM could be considered as an entertainment application of data mining. Using RCPPM the category of the preferred restaurant could be predicted for the user relying on his/her demographic data, income level and place preferences. This would help people to know the most suitable restaurant category for them without wasting time trying several places or searching among a huge amount of the available options. To this end: (i) a novel dataset was collected, using a survey, in order to build the desired prediction model and (ii) several classification styles, i.e. single and ensemble classification algorithms were utilized. The RCPPM is a single-label multiclass classification. More specifically, each sample (user) is assigned with a single class label (preferred restaurant category) from several available categories. It is interesting to note here that RCPPM could be utilized as a "recommender system" that suggests a set of real places to the user. More specifically, RCPPM could be linked with a database comprising real places, in a specific country, that combined with categories (class labels). The recommendation process commences with acquiring features from the user, and then the RCPPM predicts the category of the preferred place relying on the given features. After that, all the real places stored in the database and categorized as the predicted category will be presented to the user.
The remainder of this paper is organized as follows. Section II supplies the reader with the essential background to the work presented in this study. Section III shows the methodology that has been followed to generate the RCPPM. Section IV presents an overview of the main characteristics of the dataset used to generate the RCPPM. Section V presents the obtained results followed by Section VI with the conclusion of the presented work and directions for future research.

II. BACKGROUND
Classification is an interesting and challenging research area. Several researchers directed their research work on applying classification algorithms to real-word problems due to the potential benefits that can be summarized by producing prediction models that can predict a solution to each instance in the considered problem. As noted in the introduction to this paper, much research work has been conducted on various domains such as medical, biological, social and entertainment domains. In order to apply classification algorithms to realworld problems, the researcher should be knowledgeable about the available classification algorithms. In this section, the necessary background regarding classification algorithms is provided to the reader. Classification algorithms can be divided into two main categories: (i) "Single" classification algorithms and (ii) "Ensemble" classification algorithms. Commencing with Single classification algorithms, where only one classifier, that generated using one classification algorithm, is used for predicting output (class label). Several algorithms are available for this purpose, the most vastly used algorithms are: • Naïve Bayes (NB) algorithms, which generate probabilistic classifiers relying on Bayes' theorem.
• Decision Tree (DT) algorithms, which produce decision tree classifiers where none-leaf nodes represent features (input) and leaf-nodes represent class labels (output).
• Rule-Based (RB) algorithms, which generate classifiers comprised of a set of "If-Then" rules. Features (input) are presented at the If side, while class labels (output) at the Then side.
• k-Nearest Neighbor (kNN) algorithms, in which the generated classifiers are referred to as lazy classifiers, because no classification models are generated. Class labels (output) are predicted based on similarity.
• Artificial Neural Network (ANN) algorithms, which produce sophisticated mathematical classifiers that comprised of connected input/output units (neurodes) and communication channels (connections).
• Support Vector Machine (SVM) algorithms, which generate classifiers by finding a "hyperplane" that distinctly distinguishes the two classes featured in the dataset.
With respect to Ensemble classification, several classifiers cooperate together to output a more effective prediction than what can be acquired from using a single classifier. If the base classifiers within the Ensemble are generated using one classification algorithm, then the Ensemble is referred to as "Homogeneous". While if the base classifiers are produced using more than one classification algorithm, then the ensemble is called "Heterogeneous" [14]. Any classification algorithm, such as DT, NB and SVM could be used to construct the base classifiers within the Ensemble. Three fundamental methods are usually used to combine the results produced by the individual classifiers: weighted averaging, majority voting and averaging [15]. Numerous researchers provided theoretical and practical evidences that Ensemble generally produces more effective prediction than their base classifiers when they are used alone (single classification) [14], [16], [17]. The most widely used Ensemble classification algorithms are: • Bagging, in which several classifiers are constructed in parallel, using different variations of the considered dataset. To output prediction, voting is adopted to combine results from the trained classifiers [18], [19].
• Boosting, in which several classifiers are generated sequentially, the importance of the sequential connection is to use the information acquired by one classifier to enhance the training process of the next classifier [19], [20].
In this paper, several Single and Ensemble classification algorithms are utilized to generate the desired RCPPM.

III. THE ADOPTED EXPERIMENTAL METHODOLOGY
This section presents the followed methodology to produce the desired RCPPM. The first and the main step in the adopted methodology is obtaining and preparing the dataset that will be used to train the classifier. The next section describes the main characteristics of the collected dataset and the considered preprocessing. Once the dataset is preprocessed, it will be fed to one of the classification algorithms to produce the prediction model. In this study, several Single and Ensemble classification algorithms have been utilized and this will be explained in the experiment section. The last step in the adopted methodology is to evaluate the effectiveness of the generated models, in order to decide the "best" model and its applicability to be used for future prediction. In this work, accuracy and Area Under the ROC Curve (AUC) metrics have been utilized for assessing the performance of the constructed prediction models. The accuracy is a simple metric that measures the percentage of the samples correctly predicted by the prediction model. While the AUC is a robust measure to evaluate the overall effectiveness of the prediction model by measuring the area under the ROC curve which plots true positive rate and false positive rate [1].

IV. DATASET DESCRIPTION
This section presents an overview of the main characteristics of the dataset that were used to generate the RCPPM. The considered dataset was collected using a survey that covers person demographic data, income level and place preferences. Table I presents the extracted features, with a brief description of each. The main goal is to build a prediction model to predict the user-preferred restaurant category.  Fig. 1 presents labels distribution in the considered dataset. As shown in the figure, the distribution of the labels is imbalanced, thus a preprocessing is required to resolve this issue and generate an effective prediction model. The well-known Minority Oversampling TEchnique (SMOTE) [21] was adopted. SMOTE is considered as an oversampling technique that produces artificial minority class samples.  In this section, the obtained results from the undertaken experiments are presented. As noted earlier in the introduction to this paper, two categories of classification algorithms were utilized to generate the desired RCPPM: (i) Single classification and (ii) Ensemble classification. With respect to the first classification category; six well-known classification algorithms were used to produce the RCPPM: (i) Naïve Bayes (NB), (ii) Decision Tree (DT), (iii) Rule-Based (RB), (iv) k-Nearest Neighbor (kNN), (v) Artificial Neural Network (ANN) and (vi) Support Vector Machine (SVM). Regarding the second classification category, three algorithms were utilized to generate the RCPPM: (i) Bagging Ensemble Classification, (ii) Boosting Ensemble Classification, (iii) Heterogeneous Ensemble Classification. The well-known 10-fold cross validation technique was adopted to divide the dataset into training and testing sets and to obtain more accurate classification results. All classification experiments founded in this work were performed using the WEKA data mining tool [22].
Commencing with the results obtained from using single classification algorithms to construct the RCPPM. Table II presents the obtained results when using the six well-known classification algorithms. From the table it can be observed that DT and NB classifiers generated the same and the highest classification accuracy (Accuracy= 86.92 and AUC = 0.98). Because the Ensemble model effectiveness is highly affected by the base classifiers [11], the Ensemble classification experiments were only conducted using DT and Naïve Bayes classifiers as base classifiers. Table III presents the obtained results from using ensemble classification to generate the RCPPM. Note here that Bagging (DT) refers to utilizing a set of DT classifiers as the base classifiers within the Bagging Ensemble to generate the RCPPM model. While Bagging (NB) refers to using Bagging Ensemble classification with NB classifiers as the base classifiers. Boosting (DT) refers to using Boosting Ensemble classification with DT classifiers as the base classifiers, while Boosting (NB) considers using NB classifiers as the base classifiers. Regarding Heterogeneous Ensemble classification, a combination of DT and NB classifiers were utilized to generate the model. Two Heterogeneous classification approaches were utilized, the first one adopts "Majority Voting" to combine results from the base classifiers, while the second one considers "Average Probability" to output the final prediction result. From the table, Bagging Ensemble classification outperforms Boosting and Heterogeneous Ensemble classification, in terms of average accuracy and AUC, for generating the RCPPM. The worst results obtained when using Boosting Ensemble classification to generate the RCPPM.

VI. CONCLUSION
In this paper, Single and Ensemble classification algorithms have been utilized to generate a prediction model that aims to predict restaurant category preferences. The RCPPM is an intelligent prediction model that helps users to decide the best suitable place to go. The experiments have been accomplished using a novel dataset that covers person demographic data, income level and place preferences. From the reported experiments, supervised machine learning could be utilized to generate a high-performance RCPPM. Using ensemble of classifiers enhanced the classification effectiveness of the RCPPM. Moreover, Bagging Homogeneous Ensemble classification outperformed Single and Heterogeneous Ensemble classification. Although Heterogeneous Ensemble classification could be utilized to improve classification accuracy by using the power of completely different classifiers, it did not enhance the effectiveness of the RCPPM. The reason behind that could be the predictions conflict that generated by different kinds of classifiers. In the future, the authors plan to investigate the effect of using different features on predicting restaurant category preferences.