Breast Cancer Diagnosis using Artificial Neural Networks with Extreme Learning Techniques

Breast cancer is the second cause of dead among women. Early detection followed by appropriate cancer treatment can reduce the deadly risk. Medical professionals can make mistakes while identifying a disease. The help of technology such as data mining and machine learning can substantially improve the diagnosis accuracy. Artificial Neural Networks (ANN) has been widely used in intelligent breast cancer diagnosis. However, the standard Gradient-Based Back Propagation Artificial Neural Networks (BP ANN) has some limitations. There are parameters to be set in the beginning, long time for training process, and possibility to be trapped in local minima. In this research, we implemented ANN with extreme learning techniques for diagnosing breast cancer based on Breast Cancer Wisconsin Dataset. Results showed that Extreme Learning Machine Neural Networks (ELM ANN) has better generalization classifier model than BP ANN. The development of this technique is promising as intelligent component in medical decision support systems. Keywords—breast cancer; artificial neural networks; extreme learning machine; medical decision support systems


I. INTRODUCTION
The out of control development of cells in an organ is called tumors that can be cancerous.There are two kinds of tumors, benign and malignant.Benign or non-cancerous tumors are not spreading and are not life intimidating.In the other hand, malignant or cancerous tumor are expanding and life threatening [1].Malignant breast cancer is defined when the growing cells are in the breast tissue.Breast cancer is the second overall cause of mortality among women and the first cause of dead among them between 40 and 55 ages [2].Regular breast cancer diagnosis followed by appropriate cancer treatment can reduce the unwilling risk.It is suggested to do tumor evaluation test every 4-6 weeks.Based on that reason, benign and malignant detection based on classification features become very important [3].
Careful diagnosis in early detection has been proven to lessen the dead rate because of breast cancer [4].Depend on the expertise, mistakes can be made by medical professionals while identifying a disease.With the help of technology such as data mining and machine learning, diagnosis can be more accurate (91.1%) when related to a diagnosis made by an experienced doctor (79.9%) [5].
ANN is one of the best artificial intelligence techniques for common data mining tasks, such classification and regression problems.A lot of research showed that ANN delivered good accuracy in breast cancer diagnosis.However, this method has several limitations.First, ANN has some parameters to be tuned in the beginning of training process such as number of hidden layer and hidden nodes, learning rates, and activation function.Second, it takes long time for training process due to complex architecture and parameters update process in each iteration that need expensive computational cost.Third, it can be trapped to local minima so that the optimal performance cannot be guaranteed.Numerous efforts had been attempted to get the solutions of neural networks limitations.Huang and Babri [6] proved that Single Hidden Layer Neural Networks (SFLN) with tree steps extreme learning process that called ELM can solve that problems.
In this paper, we revealed the implementation of artificial neural networks with extreme learning techniques in breast cancer diagnosis.The dataset used for experiments was Breast Cancer Wisconsin Dataset that was obtained from the University of Wisconsin Hospital, Madison from Dr. William H. Wolberg [7].We compared the perfomance of ELM with conventional BP ANN with gradient descent based learning algorithms.Sensitivity, specificity, and accuracy were used as performance measurements.Results showed that ELM ANN generally produced better result than BP ANN.
The rest of this paper is organized as the following.Section 2 is dedicated as literature review.In this section, brief review of previous works in breast cancer diagnosis are presented.In Section 3, the concept, mathematical model, and training process of extreme learning machine are explained.In Section 4, experiments, results, and analysis are provided.Finally, conclusions and future works are given in Section 5.

II. LITREATURE REVIEW
The uses of classification systems in medical diagnosis, including breast cancer diagnosis, are growing rapidly.Evaluation and decision making process from expert medical diagnosis is key important factor.However, intelligent classification algorithm may help doctor especially in minimizing error from unexperienced practitioners [3].
Several techniques have been deployed to predict and recognize meaningful pattern for breast cancer diagnosis.Ryua www.ijarai.thesai.org[8] developed data classification method, called isotonic separation.The performances were compared against support vector machines, learning vector quantization, decision tree induction, and other methods based on two-breast cancer data set, sufficient and insufficient data.The experiment results demonstrated that isotonic separation was a practical tool for classification in the medical domain.
Hybrid machine learning method was applied by Sahan [9] in diagnosing breast cancer.The method hybridized a fuzzyartificial immune system with k-nearest neighbour algorithm.The hybrid method delivered good accuray in Wisconsin Breast Cancer Dataset (WBCD).They believe it can also be tested in other breast cancer diagnosis problems.
Comprehensive view of automated diagnostic systems implementation for breast cancer detection was provided by Ubeyli [10].It compared the performances of multilayer perceptron neural network (MLPNN), combined neural network (CNN), probabilistic neural network (PNN), recurrent neural network (RNN) and support vector machine (SVM).The aim of that works was to be a guide for a reader who wants to develop this kind of systems.
Numerous combinations and hybrid systems used neural networks as a component.However, since almost all of the employed neural networks are conventional gradient descent BP ANN, the novel or hybrid method still suffered the neural networks drawbacks that were mentioned in the previous section.

III. EXTREME LEARNING MACHINE
Huge efforts had been attempted to solve the weaknesses of BP ANN.Huang and Babri [6] demonstrated that single hidden layer feedforward neural networks (SLFN) with at most m hidden nodes was capable to estimate function for m different vectors in training dataset.

Given m instances in
] T as features and t , …., t p (k) ] T as target.A SLFN with M number of hidden nodes, activation function g(x) in hidden nodes, and linear activation function in output nodes is mathematically wrote as: (1) where w i ϵ R n is the weights between the input nodes and the i-th hidden node β i ϵ R p is the weights between the i-th hidden node and the output nodes is the inner product of w i and x (k) , b i is the bias of the i-th hidden node, o (k) ϵ R p is the output of neural network for k-th vector.
SLFN can approximate m vectors means that there are exist w i , β i , and b i , such that: Equation ( 5) can be written as: where is the hidden layer output matrix of the neural networks.
β є R M x p is the weights between hidden and output layers T є R m x p is the target values of m vectors in training dataset In the traditional gradient descent based learning algorithm, weights w i which was connecting the input layer and hidden layer and biases b i in the hidden nodes were needed to be initialized and tuned in every iteration.This was the main factor which often made training process of neural became time consuming and the trained model may not reach global minima.www.ijarai.thesai.orgHuang [11] proposed minimum norm least-squares solution of SLFN which didn't need to tune those parameters.Training SLFN with fixed input weights w i and the hidden layer biases b i was similar to find a least square solution of the linear system : - The smallest norm least squares solution of that linear system was (11) where was the Moore-Penrose generalized inverse of matrix H.This solution had three important properties which were minimum training error, smallest norm of weights, and unique solution which is .
The above minimum norm least-square solution for SLFN was called extreme learning machine (ELM).Given m instances in training dataset .,m}, activation function g(x), and number of hidden node M. The training process of ELM is the the following: 1. Randomly set input-hidden layer weights w i and bias b i , i = 1,…,M.2. Compute the matrix of hidden layer output H
Based on that definition, there are three main differences between BP ANN and ELM ANN.First, BP ANN needs to tuning several parameters, such as number of hidden nodes, learning rates, momentum, and termination criteria.On the other hand, ELM ANN is a simple tuning free algorithm.The only one to be defined is number of hidden nodes.Second, BP ANN works only for differentiable activation functions in hidden and output nodes while ELM ANN can use both differentiable and undifferentiable activation functions.
Finally, BP ANN get trained model which has minimum training error so that there is a possibility to finish in local minima.On the other hand, ELM ANN get trained model which has minimum training error and smallest norm of weight so that it can produce better generalization model and reach global minima [12].

IV. EXPERIMENTS, RESULTS, AND ANALYSIS
This section discussed about experimental design, generated results, and analytical process in order to get valid conclusion.

A. Experiments
The experiments consisted of three main steps, which were data gathering, data preprocessing, and performance evaluating.The dataset used in this experiment was Breast Cancer Wisconsin Dataset obtained from the University of Wisconsin Hospital, Madison from Dr. William H. Wolberg [7].The data has 699 instances with 10 attributes plus the class attributes.The class distribution are 65.5% (458 instances) for benign and 34.5% (241 instances) for malignant.The attribute information can be seen in TABLE I.In the second step, the raw dataset was preprocessed to produce well-from data that suitable for training and testing process.The first attribute, sample code number, was removed because it was not relevant to the diagnosis.The next nine attributes were normalized into [-1, 1] and used as predictor.The last attribute was transformed to 0 (benign) and 1 (malignant) such that it can be properly fitted to the standard BP ANN and ELM ANN implementation.
The method in this experiment was k-fold crossvalidation with k = 5.This means, the data were randomly divided into 5 partitions.There were 5 experiments.In the each experiment, a partition was used as testing data and the rest partitions were treated as training data.
The standard performance measurement for classification problem was accuracy.However, since the class distribution was not balanced, it was important to use specificity and sensitivity as supplementary measurements.In addition, to minimize the effect of random generated weights in BP ANN and ELM ANN, each experiment was run three times and the average results were noted.

B. Results and Analysis
With 5-fold crossvalidation method and each experiment were run three times, there will be 15 experiments in total.The whole steps had been done in computer with Intel® Core™ i3, 4096MB RAM, and Windows 7 OS.The results of ELM are given in TABEL II.www.ijarai.thesai.orgTo have clear view between the performances of ELM ANN and BP ANN, the results were transformed to graphical charts.In each performance measurement, the average values were computed in each experiment.

V. CONLUSION AND FUTURE WORKS
The performances of ELM ANN were generally better than BP ANN in breast cancer diagnosis.Although the specificity rate was slightly lower than BP ANN, it can be clearly seen that ELM ANN remarkably improved the sensitivity and accuracy rates.Based on these results, we can conclude that ELM ANN has better generalization model than BP ANN in diagnosing breast cancer based on Breast Cancer Wisconsin Dataset.
There are some necessary works to be done in near future.First, it is important to communicate the model with domain expert.The hybrid of ELM ANN with Decision Tree or any other technique that can produce meaningful knowledge representation will be promising.Second, to make intelligent diagnosis tool that can be used by end user, it is necessary to develop interactive user interface.
The development of interfaces in mobile, desktop, or web application may be useful.Third, there are new cases added regularly in the hospital.Developing intelligent diagnosis systems that can not only learn from available data in repositories but also from newly available data will be required.
Fig 1 shows the comparison of average sensitivity rates between BP ANN and ELM ANN.The comparison of specificity rates were given in Fig 2 while accuracy rates can be seen in Fig 3.

Fig. 4 .
Fig. 4. Overall average rates between BP ANN dan ELM

TABLE III
3, we can see that ELM ANN was superior compared to BP ANN.ELM ANN has better performances in term of sensitivity and accuracy in all experiments.To get general conclusion, overall comparison need to be computed.In each performance measurement, commulative average rates were matched.Fig 4 displays the whole sensitivity, specificity, and accuracy average rates between BP ANN and ELM ANN.Result showed that, generally ELM ANN were better than BP ANN.