Prediction of Prostate Cancer using Ensemble of Machine Learning Techniques

Several diseases are associated with humans; some are synonymous to female and some to male. Example of diseases synonymous to the male gender is Prostate Cancer (PC). Prostate cancer occurs when cells in the prostate gland starts to grow uncontrollably. Statistics shows that prostate cancer is becoming an epidemic among men. Hence, several research works have tried to solve this problem using various methods. Although numerous medical research works are ongoing in the area, the need to introduce technology to battle the epidemic is paramount. Because of this, some researchers have developed several models to help solve issues of prostate cancer in men, but the area is still open to contribution. Recently, some researchers have adopted some well-established Machine Learning (ML) techniques to predict and diagnose the occurrence of prostate cancer, but issues of low prediction accuracy, inability to implement model, low sensitivity; among others still lingers. This paper approached these challenges by developing an ensemble model that combines three (3) ML techniques; Support Vector Machine (SVM), Decision Tree (DT), and Multilayer Perceptron (MP) to predict PC in men. Our developed model was evaluated using sensitivity, specificity and accuracy as performance metrics, and our result showed a prediction accuracy of 99.06%, sensitivity of 98.09% and, specificity of 99.54%, which is a relative improvement on the existing systems. Keywords—Prostate cancer; machine learning; support vector; machine; decision tree; multilayer perceptron; diseases


I. INTRODUCTION
Cancer is considered one of the most dangerous diseases in the world because it is responsible for around 13% of all deaths in the world [1]. Cancer usually starts as being primary to a specific organ in the body, which later metastasizes to other parts. A common type of cancer is the Prostate Cancer (PC).Prostate cancer is the most rampant and leading cause of cancer death among men in the world, second only to leukaemia [2]. Prostate cancer which is medically referred to as carcinoma of the prostate, begins when cells in the prostate gland starts to grow uncontrollably. Research in [3]explained that prostate cancer begins when healthy cells in the prostate gland change and grow out of proportion, thereby forming a mass called tumour. Recent development in artificial intelligence is now being applied to various fields in medicine and science generally. One of these fields is in the use of Machine Learning techniques to solve issues of prostate cancer. Although, several researchers have tried to predict and diagnose PC in men using several well established ML techniques individually, research in [4], [5], and [6], among others, shows that issues of low prediction accuracy and sensitivity still lingers. This research approached these challenges by combining three (3) well established machine learning techniques (Decision Tree, Support Vector Machine and Multilayer Perceptron), to form an ensemble model that aims to address the recurrent issues associated with the use of single Machine Learning techniques.
The rest of this paper is organized as follows: Section I introduces prostate cancer and justifies the need to carry out this research, Section II reviews related works that have attempted to predict and attend to issues relating to PC, Section III explains the methodology, Section IV presents the results and discussion, Section V concludes

II. RELATED WORKS
The prevalence of prostate cancer is increasing by the day. Statistics shows that almost one-third of men over 50 years old will be diagnosed with prostate cancer during their life time [7]. Author in [8], defined prostate cancer as the cancer that occurs in the prostate, a small walnut-shaped gland in men that produces the seminal fluid that nourishes and transports sperm. It is recommended that men have a prostate examination by age 50 [9]. Performing prostate test starts with a Prostate Specific Antigen (PSA) test, and a core biopsy is recommended should the patient have PSA value higher than normal [7]. Biopsy is the gold standard for cancer diagnosis.
Although several works have tried to contribute to Prostate cancer epidemic using various medical approaches, the advent of technology also brought about the development of some computer aided solutions. Example is in [7] where the authors developed a computer aided diagnostic tool that uses image processing techniques for efficient PC diagnosis and prognosis. The authors collected images of prostate gland as shown in Fig. 1, and then separate the images into various portions to diagnose prostate cancer. The introduction of imaging and machine learning techniques to acquire, process and analyse images from biopsies is of utmost importance [10], because some other diseases imitates prostate cancer. Example is the Benign Prostatic Hyperplasia (BNH), which occurs when the prostate begins to press against the urethra as a result of growth, thereby causing urinary problems [11]. However, the occurrence of prostate cancer is common among men aged 50 and above.
It is essential to trust prediction and diagnosis made using artificial intelligence [12]. Therefore, accuracy of ML predictions is very important. Research in [11] proposed the use of Artificial Neural Network to detect early signs of prostate cancer, but the model could not record perfect accuracy. Author in [13], also applied artificial neural networks (ANN) with back propagation to predict prostate cancer recurrence in patients, but the evaluation could not achieve optimal accuracy. Research in [14]also developed a model using Fisher Linear Classifier to predict recurrence of prostate cancer in men, but their model achieved an accuracy of 93%.Zhao et al., [15]proposed a Penalized Logistic Regression Technique based on top-scoring pair (TSP) as a classification model to predict prostate cancer, but perfect accuracy was also not recorded. Authors in [16]proposed a prostate cancer predictive model using Decision Tree Algorithm. The research established Decision Tree as a useful data mining algorithm for predicting prostate cancer, but the model is not reliable due to low accuracy. Geet al., [17], proposed a prostate cancer predictive model using Logistic Regression and Artificial Neural Network, but the individual accuracy of the algorithms stood at 84.02% and 85.09% respectively. Takeuchiet al., [18]proposed a prostate cancer prediction system on prostate biopsy using Multilayer Artificial Neural Network (ANN), but the system was able to predict with an accuracy of 71.6%, but this can be associated with the insufficient amount of dataset used for the research. In order to combat the recurrent issue of accuracy, our research proposes an ensemble model that combines three ML techniques. The method and functionality of our model is discussed in the next section.

III. METHODOLOGY
The architecture of the model is shown in Fig. 2. The architecture shows the components of the developed model. The functionalities of each component are explained in details below:

A. Datasets (Prostate and Non-Prostate Cancer)
The dataset used in the research is obtainable fromhttp://github.com/selva86/datasets/masters/prostate.csv. The obtained data contains about one thousand, nine hundred and forty (1,940) study patients which make up the instances of the data. Each of the instances consist of 10 attributes including class label indicating that an instance is either a Benign (0) or Prostate cancer sample (1). The attribute values are all numeric, Table 1 shows the description of the data attributes.

B. Data Normalization
The obtained data was normalized using -score normalization in order to make training less sensitive to the scale of features. score will convert the data into [0,1] distribution using equation (1) is the data value to be normalized, μ represents the mean of data values in the feature category

C. Data Training and Testing
The normalized data was divided into training and testing set using a 67% -33% split ratio as shown in Table 2.The training set was used to train the classifiers, the testing set was used to evaluate predictive models. The classification algorithm used in this research for predicting the presence of prostate cancer is an ensemble of three (3) classifiers: Support Vector Machine (SVM), Decision Tree (DT), and Multilayer Perceptron. The ensemble algorithm predicts the presence of the three classifiers (SVM, DT and MP) predictions 1 , 2 3 respectively, to make final prediction as follows: Given training set of prostate cancer data is given as: Where is the training set of prostate data, is an input for the -th prostate data described by set of attributes 1 2 3 , ∈ {0,1} is its corresponding class label indicating whether the sample is a benign sample ( = 0) or a prostate cancer ( = 1), and represents the total number of data samples.
The first classifier 1 which is Linear SVM make prediction 1 as either ( = 0) or ( = 1) , by creating decision boundaries (hyperplanes) that linearly separates the two classes using equation (3) ( . ) + = 0 Such that Where denotes an instance of a prostate cancer sample, w represents the weight vector, b is the bias.
Where is a value in attribute , value ( ) represents all possible values in , represents instances for which has , represents number of instances in ( ) and ( ) respectively, ( ) represents the probability of class in , is the distinct number of class values, and is the number of outcome of test attribute . The process is continued over each , where 1≤ ≤ , until all elements in each final subset falls under the same class.

Multilayer Perceptron
The third classifier 3 makes its prediction 3 as MLP accepts input vector multiplied by a weight vector , and added to a bias to produce an outputŷ using the following equation: where is the number of input-output pairs, and is a non-linear activation function presented in equation (11).
To determine the prediction error of MLP, the Mean Square Error (MSE) function is applied as follows: Where is the error function between the predicted class ŷ and the target classy Also, training the MLP by backward propagation involves computing the gradient of the error with respect to is using chain rule of differentiation as follows:

← /
Where is the gradient descent, represents weight. Thereafter, is updated in the direction via the gradient that helps minimize the loss.

1) Majority Voting Classification
This involves combining predictions P1, P2 and P3, of the individual base classifiers C1, C2 and C3 respectively to make a final prediction , by predicting the class label that have been predicted most frequently using equation (13) and (14).
Where represents the decision of the − ℎ classifier given class , represents the final prediction by the ensemble, and is the number of the base classifiers. Our ensemble model was implemented using Python 3.7, Spyder python editor via Anaconda Distribution, Excel spreadsheet package, Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz, 2501 Mhz, 2 Core(s), 4 Logical Processor(s).

IV. RESULTS AND DISCUSSION
The Confusion Matrix result of the developed prostate cancer prediction model when applied on the test data is shown in Table III. From the study, it is shown that out of 209 actual prostate cancer data and 432 non-prostate cancers from the 641 test data, the model predicted 205 prostate cancer instances correctly, and predicted 4 incorrectly, while also predicting 430 non-prostate cancers correctly with 2 incorrectly. In all, 635 data was correctly classified, while 6 were incorrectly classified.  In order to test the efficiency of the ensemble model, the dataset was tested with DT, SVM and MP individually, and the result is presented in Table VI.
The result showed that the developed ensemble model had the highest number of correctly classified instances with 635 instances with number of incorrectly classified instances as zero (6) instances. However, MP also showed to be effective as it correctly classified 626 instances and misclassified 15 instances. From the study, SVM result was not suitable for the purpose of this research work as it correctly classified all nonprostate cancer instances as it predicted all the 432 nonprostate cancer correctly, but wrongly classified all prostate cancer instances with the number of AP recorded as zero (0).
The graphical representation is presented in Fig. 3. Table VII shows the Accuracy, Sensitivity, and Specificity of the developed ensemble model and the base models. Our ensemble model shows to be the most effective model with an Accuracy of 99.06%, Sensitivity of 98.09%, and Specificity of 99.54% as compared to the result from other models displayed in table. Figure 4 shows graphical representation of evaluation of the proposed ensemble model with the base models.
In order to evaluate the performance of the developed ensemble system, our results were compared with some existing works as shown in Table VIII, in which the developed model shows to be a better model for the prediction of prostate cancer based on its high Accuracy. Fig. 5 shows graphical representation of the comparison.

V. CONCLUSION
The developed model is revealed to be effective in detecting both non-prostate and prostate instances. Using sensitivity, specificity and accuracy as performance metrics, our result has shown a prediction accuracy of 99.06%, sensitivity of 98.09% and, specificity of 99.54%, which is a relative improvement on the existing systems. In other words, we have been able to significantly tackle issues of accuracy and sensitivity in the prediction of prostate cancer in men, using this ensemble model, which shows a relative improvement when compared to the individual base algorithms and some existing models.