An Ensemble of Fine-Tuned Heterogeneous Bayesian Classifiers

Bayesian network (BN) classifiers use different structures and different training parameters which leads to diversity in classification decisions. This work empirically shows that building an ensemble of several fine-tuned BN classifiers increases the overall classification accuracy. The accuracy of the constituent classifiers can be achieved by fine-tuning each classifier and the diversity is achieved using different BN classifiers. The proposed ensemble combines a Naive Bayes (NB) classifier, five different models of Tree Augmented Naive Bayes (TAN), and four different model of Bayesian Augmented Naive Bayes (BAN). This work also proposes a new Distance-based Diversity Measure (DDM) and uses it to analyze the diversity of the ensembles. The ensemble of fine-tuned classifier achieves better average classification accuracy than any of its constituent classifiers or the ensemble of un-tuned classifiers. Moreover, the empirical experiments present better significant results for many data sets. Keywords—Ensemble classifier; Bayesian Network (BN) classifiers; Fine-tuned BN classifiers; Stacking; Diversity


INTRODUCTION
Bayesian network classifiers are probabilistic models that encode the conditional independence relationships between the attributes in different ways.There are many learning algorithms to build TAN and BAN structure, such as TAN search, K2 search, Tabu Search, Hill Climber Search and Repeated Hill Climber Search.These search algorithms yield different TAN and BAN classifiers.
Building ensembles of classifiers is a powerful method for obtaining better classification accuracy through combining the classification of multiple classifiers [1].Boosting [2] [3] and bagging [2] [3] are the two most commonly used methods for building ensembles of homogenous classifiers.On the other hand, stack generalization (stacking) [4] and ensemble selection [5] are suitable for building ensembles of heterogeneous classifiers.
Diversity and the accuracy of the base classifiers are important factors to achieve a powerful ensemble of classifiers.It would be meaningless to combine several classifiers that make the same predictions.The intuition is that if many classifiers make errors on different instances, the combination of these classifiers can reduce the overall error and improve the performance of the ensemble system [6].The main advantage of ensemble different BN classifiers is that it is unlikely that all classifiers will make the same mistake.It would also be meaningless to combine classifiers that are too weak.Therefore, in order to build ensemble of classifiers with better accuracy, we need to combine relatively accurate and diverse classifiers.
The diversity of classifiers is achieved by using single learning algorithm with different in data sets (using sampling), training parameters, or subset of features [1] [7] [8].These methods are considered homogenous methods because they use the same learning algorithm.On the other hand, an ensemble might consist of a group of classifiers, each built using the same training data but a different learning algorithm [7] [9].Ensembles of heterogeneous classifiers might be more suitable if the learning algorithms are stable in the sense that a small change in the training data does not lead to a substantially different classifier.Heterogeneity might be more suitable at achieving diversity in this case.Naive Bayes (NB) and Tree Augmented Naive Bayesian (TAN) are known to be stable algorithms [10] [11].
We empirically show that ensemble several fine-tuned BN classifiers, namely: fine-tuned Naive Bayesian classifiers (FTNB) [12], fine-tuned TAN (FTTAN) [13] and BAN (FTBAN) classifiers, achieves better classification accuracy for many data sets, than an ensemble of un-tuned classifiers or any of its constituent classifiers.We also propose a Distance-base Diversity Measure (DDM) and use it to analyze our results.Since the error rate of different BN classifiers is below 50%, we expect that the ensemble classifier will yield better classification accuracy over the constituent classifiers.In this research, we achieve diversity by using different types of BN classifiers by using NB classifier, TAN classifier, and BAN classifier to construct an ensemble of classifiers.Moreover, we use different models of TAN and BAN by using different search algorithms to build its structure.Also, by using finetuned classifiers [12] [13], we are constructing an ensemble of relatively accurate classifiers.This work improved the classification accuracy of BN classifiers by building three different ensemble classifiers: 1) the original un-tuned 10 BN classifiers (NB, five models of TAN and four models of BAN), 2) the corresponding 10 fine-tuned BN classifiers, 3) a combination of all previous twenty BN classifiers.We also compared the results these three different ensembles of classifiers.www.ijacsa.thesai.orgThis work is structured as follows: in section II we review the related work on building ensembles of BN classifiers.In section III, we present our BN ensembles of classifiers.Section IV presents the experimental results and a comparison between the different ensembles.Section V is the conclusion.

II. RELATED WORK
Diversity among individual classifiers is important in order for an ensemble to achieve better accuracy than the accuracy of any of constituent classifiers.Usually diversity of the base classifiers is achieved by training single learning algorithm on different data sets (bootstrap resampling) [11] [14] [15] [16], different parameters [14], or different features [1] [7] [8].However, some works achieve diversity by training different learning algorithms on the same data.Ma and Shi [14] propose TAN learning algorithm called Random Tree-Augmented Naive Bayes (RTAN) that generates different TAN classifiers to be combined in an ensemble classifier.The algorithm builds TAN model by selecting the arcs whose conditional mutual information is larger than a certain threshold value.RTAN algorithm builds different TAN models by using different threshold values and different start edges.RTAN algorithm is trained on different training subsets, and then the different TAN classifiers to construct TAN ensemble classifier using a majority of votes.Their experimental results show that bagging Multi-TAN ensemble classifier has higher classification accuracy than the standard TAN classifier.Also, Shi et.al [11] used RTAN algorithm for boosting MultiTAN that shows higher classification accuracy than standard TAN classifier.Sun and Zhou [15] [16] used a boosting technique that is characterized by the way in which the hypothesis weights are selected, and by the instance weight update step.They used boosting to combine multiple TAN classifiers and compared it with Boosting-BAN classifiers.Their experimental results show that the Boosting-BAN has higher classification accuracy than Boosting-MultiTAN on noise-free data.Moreover, Sun and Zhou [17] built an ensemble combing Boosting-BAN and Boosting-MultiTAN using the sum voting methodology.The sum rule adds all confidence scores of sub-ensemble Prediction for each class and the class with the highest sum wins the election.They report that their proposed ensemble classifier is significantly more accurate than TAN, BAN, Boosting-BAN and Boosting-MultiTAN methods.Tsymbal et al. [8] developed an ensemble of NB classifiers that randomly samples the feature space.They found that the performance of their ensemble of classifiers performed better than a single naive Bayesian classifier.Lee and Cho [7] combined three different classifiers to build an ensemble.They created a General Bayesian network (GBN) to identify the variables inside the Markov blanket of GBN's class node, and then used those selected variables to create a GBN-assisted ensemble by combining GBN, decision tree, and/or SVM using voting and stacking combination strategies.They found that the ensemble systems generally improved the prediction accuracy.Sakkis, et al. [18] use stacked generalization approach to anti-spam filtering.They combined a memory-based classifier and a Naïve Bayes classifier in an ensemble classifier.Their experiments improved the performance of anti-spam filter and outperformed the two base classifiers.They report that the improvement of the ensemble of classifiers is due to the high diversity of the two base classifiers.Jing et al. [19] construct an ensemble Bayesian belief network (BBN) and exploit TAN learning algorithm to build a BBN structure.They combined parameter boosting and structure learning to improve the classification accuracy of BBN classifiers.Their algorithm goes through a fixed number of iterations and stops if the training error increases.At the beginning of each iteration a training set and its corresponding weights for the data points are given to the TAN algorithm to build a BBN Classifier.The TAN algorithm is used to build base classifier, it starts with an empty set and adds i edges with the highest mutual information to a naïve BBN.The training error of the resulting TAN classifier is then used to determine the weight of the test data points in subsequent iterations.According to their results, their boosted BBNs have comparable or reduced average testing error than NB and TAN.This work has an advantage over the previous works by using fine-tuned BN classifiers.Fine tuning process address the unreliable estimation of the attributes conditional probabilities due to the lack of data and improve BN classifiers accuracy by finding more accurate estimation of the probabilities terms.

III. ENSEMBLE BAYESIAN NETWORK CLASSIFIERS
In this work, we build an ensemble of BN classifiers.Each classifier in the ensemble is trained using the same data.Stacking [4] is employed to build three different ensembles.Stacking is employed to combine classifiers built by different learning algorithms.The main idea behind Stacking is to use the classifications of a set of base classifiers (level-0) estimated by using cross-validation, to learn a meta classifier (level-1) which gives the final prediction [20].
In this research, stacking splits the data set into two disjoint parts (using 10-fold Cross-Validation), then train all BN base learners on the first part.Then test the base learners on the second part.The predictions of all BN base classifiers are combined by using simple plurality voting to produce an ensemble of BN classifiers.Diversity is achieved by using three different types of BN (NB, TAN and BAN) classifiers.Moreover, we exploit the structure learning algorithms to build five different TAN classifiers and four different BAN classifiers.The search algorithms that were used to build different TAN classifiers are: TAN search, K2 search, Tabu Search, Hill climber Search and Repeated Hill Climber Search.The last four search techniques were used also to build four different BAN classifiers.We also, used their corresponding fine-tuned classifiers: fine-tune NB (FTNB) [12], fine-tune TAN (FTTAN) [13] and fine-tune BAN (FTBAN).Three ensemble classifiers were built; the first one combines 10 BN classifiers (NB, five models of TAN and four models of BAN).The second ensemble classifier combines the 10 corresponding fine-tuned classifiers, and the last one combines all 20 BN classifiers (fine-tuned and un-tuned).

A. Distance-Based Diversity Measure (DDM)
Since the diversity of the base classifiers has direct effect on the ensemble's classification accuracy, there is a need to be able to measure it.Kuncheva and Whitaker [21] compared several measures of diversity and concluded that all measures had approximately equally strong relationships and they were strongly correlated.Some of their experiments revealed the www.ijacsa.thesai.orginadequacy of these measures to predict the accuracy of the ensemble.The low correlation between these measures on the one hand and the improvement in classification accuracy on the other hand, is discouraging.This work proposes a new distance-based diversity measure and uses it to analyze the relationship between the base classifiers diversity and the ensemble accuracy.
We have M classifiers and C classes, if we ignore accuracy and the ideal diverse ensemble would give equal votes for each class.For example, if we have 10 classifiers and 5 classes, the ideal (most diverse) vote vector for the five classes would be (2, 2, 2, 2, 2).In other words, the vote vector in which each class would get M/C votes.The least diverse ensemble is the one that has all its constituent classifiers voting for the same class, while all remaining classes have zero votes.In our example, the vote vector for the five classes would be something like (0, 0, 0, 0, 10).A good diversity measure would be based on the distance between the ideal vote vector and the actual vote vector for all instances.The small distance indicates more diverse classifiers and large distance indicates less diverse classifiers.We can compute the distance for an instance i giving its voting vector as follows: This distance should be computed for N instances in the training set.
The max distance, that could be achieved a voting vector of a given instance is This is because of the classes would get 0 votes and one class would get all votes.The distance for the classes is ( ) ( ) and the distance for the class that gets all votes is ( ) Therefore, the maximum distances for all instances .
The Distance-Based Diversity Measure (DDM) is defined as follows: Thus, diversity ranges from zero to one, where zero indicates the lowest diversity and one indicates the highest diversity.

IV. EXPERIMENTAL AND RESULTS
In all experiments, we used 40 data sets, obtained from the UCI repository [22].The BAN models used in our experiments had a maximum of three parents for each attribute node.All ordinal attributes were discretized using Fayyad et al.'s [23] supervised discretization method, as implemented in Weka.The missing values in the data sets were simply replaced by the most common values.Ten-fold cross validation was used in all experiments.All experiments were implemented in the Weka workframe and used as much of the Weka classes as possible.We built three ensembles of classifiers that are based on different types of BN classifiers (NB, TAN and BAN).The ensemble of classifiers uses a simple majority (plurality) voting technique to classify instances.
We used classification formulas Eq. ( 5) for NB and Eq. ( 6) for TAN and BAN classifiers, as proposed by Friedman et al. [24].
We used Laplace estimator to estimate all probabilities values.
Where K is the number of different values of x; and Alpha is a small positive value.In our experiments, we used Weka simple estimator with Alpha = 0.5 to estimate the conditional probability of NB (which is the default value used by Weka for NB) and we choose Alpha = 0.2 for TAN and BAN (which gave us best results).We experimented with different values for Alpha and 0.2 gave us the best results.
The tables from 1 to 5 show the results of our three ensembles of BN classifiers.Also, it compares each ensemble BN classifier with its individual base classifiers.The last four rows of each table show the average values (classification accuracy over the 10 folds and ensemble diversity), the number of data sets with better results, and the number of data set with significantly better results at the 95% and 90% confidence levels.A paired t-test with confidence levels of 95% and 90% was used to determine whether the differences were statistically significant.The better results are highlighted in bold in the tables.The significant results, at 95% confidence level, are highlighted in bold and underlined, while the significant results at 90% confidence level are double underlined.

A. Stacking BN classifiers
The first experiment combined ten BN classifiers, an NB, five TAN classifiers and four BAN classifiers.Different structure learning algorithms were used to build different TAN and BAN classifiers.The five different TAN classifiers are distinguished by using the name of the search algorithm used to build them as a postfix.Thus, we have TAN-TAN search, TAN-K2, TAN-tabuSearch, TAN-HillClimber and TAN-RepeatedHillClimber.In the same way, we denoted the four different BAN classifiers (BAN-K2, BAN-tabuSearch, BAN-HillClimber and BAN-RepeatedHillClimber).The Ensemble classifier of the ten BN classifiers is called (EBN-10).
Table 1 shows the results of EBN-10 and the results of each of the constituent classifiers.It is obvious from the table that the average classification accuracy of EBN-10 is better than the average accuracy of any of the constituent classifiers.The average accuracy of the ensemble classifier is 71.69%, while the average accuracy of the constituent classifiers ranges from 64.92% to 70.88%.Moreover, the ensemble classifier outperforms all constituent classifiers in terms of the number data sets for which it achieves better and significantly better www.ijacsa.thesai.orgresults, at the 95% confidence level.EBN-10 outperformed NB on 20 data sets; ten of them are significantly better results.Regarding the TAN models, EBN-10 outperformed TAN-TAN search, TAN-K2 and TAN-tabuSearch on 19 data sets, five, four and eight of them, respectively, are significant better results.Also, EBN-10 classifier outperforms TAN-HillClimber on 15 data sets; five of them are significant better results and outperforms TAN-RepeatedHillClimber on 21 data sets; seven of them are significant better results.The table also shows more obvious superior results of EBN-10 with BAN models.EBN-10 outperformed BAN-K2 and BAN-tabuSearch on 26 data sets, 19 and 16 of them, respectively, are significant better results.Moreover, EBN-10 classifier outperforms TAN-HillClimber on 24 data sets; 18 of them are significant better results.Also, EBN-10 classifier outperforms TAN-RepeatedHillClimber on 27 data sets, 19 of them are significant better results.

B. Stacking Fine-Tuned BN Classifiers
In second experiment, we constructed an ensemble of the same classifiers but after fine-tuning them.We used the finetune NB (FTNB) [12], and its adapted (FTTAN) [13] version to fine-tune TAN learning algorithm and fine-tune BAN (FTBAN).Fine-tuning each classifier improves the classification accuracy by finding more accurate estimation of probabilities terms.The enhanced accuracy of BN classifiers encouraged us to build an ensemble of these fine-tuned classifiers.The ensemble of the 10 fine-tuned BN classifiers is called (EFTBN-10).
Table 2 shows the results of EFTBN-10 and the results of the each of the constituent classifiers.The average classification accuracy of EFTBN-10 is better than all individual FTBN classifiers.The average accuracy of ensemble fine-tuned classifier is 72.48%, while the 10 FTBN classifiers average accuracy range between 66.08% and 72.01%.On other hand, the fine-tuned ensemble classifier outperform all individual FTBN classifiers in the number of better and significantly better number of data sets at the 95% confidence level.EFTBN-10 outperformed FTNB on 26 data sets; eight of them are significantly better results.Also, EFTBN-10 outperformed FTTAN-TAN search for 23 better data sets, six of them are significantly better results.Moreover, EFTBN-10 outperformed FTTAN-K2 and FTTAN-HillClimber search on 18 data sets, four and six of them are significantly better results, respectively.Also, EFTBN-10 outperformed FTTAN-TabuSearch and FTTAN-Repeated HillClimber on 22 better data sets, 10 and six of them are significantly better results, respectively.
The improvements of EFTBN-10 are even more obvious compared with the fine-tuned BAN classifiers (FTBAN).EFTBN-10 is better than FTBAN-K2 for 27 data sets, 17 of them are significantly better results.EFTBN-10 also achieved results for 27 data sets than FTBAN-TabuSearch, 12 of them are significantly better results.Moreover, EFTBN-10 outperformed FTBAN-HillClimber and FTBAN-RepeatedHillClimber on 27 and 30 better data sets, 19 and 16 of them are significantly better results, respectively.The superiority of EFTBN-10 is even more obvious at 90% confidence level (see the last row of Table 2).

C. Stacking BN classifiers and their corresponding fine-tuned classifiers
In the third experiment, we built an ensemble by combining the previous twenty BN classifiers (10 BN classifiers and their corresponding fine-tuned BN classifiers).We call this ensemble EBN-20.
Table 3 and Table 4 show the results of EBN-20 compared to the result of each of the constituent classifiers.The result of this ensemble is a compromise of the previous two classifiers.The tables show that the average classification accuracy of EBN-20 is 71.56% which is better than the average accuracy of any of its 20 constituent classifiers, except for FTTAN-TAN and FTTAN-K2.The result not a surprising because TAN search and K2 search algorithms have exhibited excellent performance in data mining [25] [26] and the fine tuning process makes them even better.The degradation of EBN-20 average accuracy is probably because EBN-20 combines finetuned and non-fine-tuned classifiers, which reduces diversity, as the constituent classifiers are not very different classifiers.In the terms of number of better and significantly better data sets at 90% confidence level, EBN-20 outperformed all the 20 individual classifiers.Also, EBN-20 outperformed all of the 20 classifiers with respect to the number of data sets it achieves better and significantly better data sets at 95% confidence level except for FTTAN-K2 classifier where it wins on four data sets and loses on five data sets (see Tables 3 and 4 for more details).

D. Comparing the Three Ensembles
Table 5 shows the results of comparing the three ensembles: ENB-10, EFTBN-10, and EBN-20.The table also shows the diversity value for each ensemble.As can be seen in table, EFTBN-10 outperforms EBN-10 with respect to the average classification accuracy, and the number of data sets for which it achieves better and significantly better results.EFTBN-10 achieves on average 72.47% classification accuracy, while EBN-10 achieves 71.69%.EFTBN-10 also achieves significantly better results for 6 data sets and worse results for only 1 data set.EFTBN-10 outperforms EBN-10 because its constituent classifiers, namely the fine-tuned classifiers, are more accurate than the constituent classifiers of EBN-10.In fact, the proposed diversity measure shows that both ensembles have the same average diversity of 0.44.
Comparing EFTBN-10 with EBN-20 shows that EFTBN-10 also outperforms EBN-20, which has an average classification accuracy of 71.56%.EFTBN-10 also achieves better results than EBN-20 for 13 datasets 3 of them are significantly better and 2 are significantly worse at 95% confidence level.At 90% confidence level, EFTBN-10 achieves better results for 5 data sets and worse results for 2 data sets.This result is a little bit surprising because EBN-20 contains much more classifiers.It contains the same classifiers of EFTBN-10 in addition to their un-tuned counterparts.This indicates EBN-20 must have less diversity than EFTBN-10, which is expected because the fine-tuned classifiers and their un-tuned counterparts are not very different classifiers.The proposed diversity measure actually supports this analysis.The diversity measure shows that EBN-20 has less diversity than www.ijacsa.thesai.orgEFTBN-10.EBN-20 has an average diversity of 0.13 while EFTBN-10 has an average diversity of 0.44.
Although EBN-10 has more diversity than EBN-20, it achieves worse results.Its average classification accuracy is 71.56, while the average accuracy of EBN-20 is 71.69.Moreover, EBN-20 achieves better results for 16 data sets, 6 of them are significantly better at 90% confidence level.While ENB-10 achieves better results for 11 data sets only 2 of them are significantly better at 90% confidence level.This result occurred because EBN-20 contains the 10 fine-tuned version of the BN classifiers (in addition to their un-tuned counterparts), while EBN-10 contains only the less accurate un-tuned classifiers.V. CONCLUSION This work shows that an ensemble of fine tune BN classifiers is an effective way to increase the classification accuracy of BN classifiers.It also empirically concludes that the ensemble of the fine-tuned classifiers outperforms an ensemble of un-tuned classifiers.Although the two ensembles have the same average diversity, the ensemble of the fine-tuned classifiers combines more accurate classifiers.However, constructing a larger ensemble that combines the fine-tuned and un-tuned classifiers does not improve the classification accuracy because the combined classifiers are not very different.The work also proposes a distance-based diversity measure and uses it in analyzing the results.The ensemble of classifiers combines different types of BN classifiers (NB, TAN, and BAN).Different learning algorithms that use different search methods were used to build TAN and BAN classifiers.The variation of the BN classifiers increases the diversity of the ensemble while using fine-tuned classifiers increase accuracy of the constituent classifiers.The work compares between three different ensembles of BN classifiers.The first ensemble, EBN-10, combines ten un-tuned classifiers; the second, EFTBN-10, combines ten fine-tuned BN classifiers while the third ensemble combines all the previous 20 BN www.ijacsa.thesai.orgclassifiers (EBN-20).The experimental results using 40 data sets and a simple majority voting method shows that the ensembles outperform all the individual constituent classifiers.It also states that the EFTBN-10 is the superior one because it has more accurate constituents and is more diverse.

VI. FUTURE WORK
As a future work, we intend to develop ensembles of BN classifiers by using different other BN classifiers.Also it is interesting to develop a new fine tuning algorithm to improve the accuracy of the ensemble base classifiers.Moreover, different ensemble method and voting techniques can be used.

TABLE I .
EBN-10 ENSEMBLE CLASSIFIER COMPARED TO THE 10 INDIVIDUAL BN CLASSIFIERS

TABLE IV .
CONTINUED -FTBN-20 ENSEMBLE CLASSIFIER COMPARED TO THE 20 INDIVIDUAL CLASSIFIERS