Determining the Efficient Structure of Feed-forward Neural Network to Classify Breast Cancer Dataset

—Classification is one of the most frequently encountered problems in data mining. A classification problem occurs when an object needs to be assigned in predefined classes based on a number of observed attributes related to that object. Neural networks have emerged as one of the tools that can handle the classification problem. Feed-forward Neural Networks (FNN's) have been widely applied in many different fields as a classification tool. Designing an efficient FNN structure with optimum number of hidden layers and minimum number of layer's neurons, given a specific application or dataset, is an open research problem. In this paper, experimental work is carried out to determine an efficient FNN structure, that is, a structure with the minimum number of hidden layer's neurons for classifying the Wisconsin Breast Cancer Dataset. We achieve this by measuring the classification performance using the Mean Square Error (MSE) and controlling the number of hidden layers, and the number of neurons in each layer. The experimental results show that the number of hidden layers has a significant effect on the classification performance and the best classification performance average is attained when the number of layers is 5, and number of hidden layer's neurons are small, typically 1 or 2.


INTRODUCTION
Classification is one of the most frequently encountered problems in decision making tasks.A classification problem occurs when an object needs to be assigned in predefined classes based on a number of observed attributes related to that object.Many problems in business, science, industry, and medicine can be treated as classification problems.
Neural networks have emerged as one of the tools that can handle the classification problem.The advantage of neural networks is that, neural networks are data driven self-adaptive methods so that they can adjust themselves to the data without any explicit specification of functional form for the underlying model, and they can approximate any function with arbitrary accuracy.
Artificial neural networks consist of an input layer of nodes, one or more hidden layers and an output layer.Each node in a layer has one corresponding node in the next layer, thus creating the stacking effect [1].
The hidden layer is a collection of neurons which provide an intermediate layer between the input layer and the output layer.Activation functions are typically applied to hidden layers.
Neural Networks are biologically inspired and mimic the human brain.A neural network consists of neurons which are interconnected with connecting links, where each link have a weight that multiplied by the signal transmitted in the network.The output of each neuron is determined by using an activation function such as sigmoid and step.Usually nonlinear activation functions are used.Neural networks are trained by experience.When an unknown input to the network is applied, it can generalize from past experiences and product a new result [2], [3].
Feed-forward neural networks (FNN) are one of the popular structures among artificial neural networks.These efficient networks are widely used to solve complex problems by modeling complex input-output relationships [4], [5].
Neural networks have been widely used for breast cancer diagnosis [6] [7] [8], and Feed-forward Neural Network (FNN) is commonly used for classification.Many researches evaluates the effect of the number of neurons in the hidden layer [9] [10] [11] [12].
In this paper an experimental investigation was conducted to see the effect of the number of neurons and hidden layers of feed forward neural network on classification performance for the breast cancer dataset.The work of this paper will be presented in different sections.In the second section materials and methods are introduced.An experiment and results are presented in the third section.Section four gives discussion and conclusions.

II. MATERIALS AND METHODS
The performance analysis of FFNN is to estimate the training and generalization errors.The result with the minimum estimated generalization error is used to determine an optimum for the application of neural network model [13].
The feed forward neural network is built using of Levenberg-Marquardt training algorithm which is widely used in classification literature [14,15,16].The network architecture used is composed of nine neurons for input layer and one neuron for the output layer.To achieve the paper objectives, www.ijacsa.thesai.org the number of hidden layers and the number of neurons per hidden layer are changed during the training and simulation of the network.The learning rate of 0.5 was used.The number of maximum allows epochs were 1000.The activation functions used in the different layers of the neural network is logsig.The performance of the classification is measured by the mean square error (MSE) which is calculated by equation 1.

∑ = ∑
Where t(k) is the target output and a(k) is network output.
In this paper Wisconsin Breast Cancer Data (WBCD) is used, which have been analyzed by various researchers of medical diagnosis of breast cancer in the neural network literature [5], [16], [17], [18].This data set contains 699 instances.The first attribute is the ID of an instance, and the next 9 attributes (Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli and Mitoses) represent different characteristics of an instance and the last attribute takes the values 2 and 4 (2 for benign, 4 for malignant).Each instance has one of 2 possible classes (benign or malignant).
In our experiments all 9 attributes are used.Each attribute has the domain 1-10.The data set was partitioned into two sets training and testing set.The testing set was not seen by neural network during the training phase.It is only used for testing the neural network after training.
In our classification experiments all 9 attributes are used, and 80% of the data is used for training, and the remaining 20% is used for testing.

III. EXPERIMENTS AND RESULTSA
Five experiments are carried out.In the first experiment the network is trained with one hidden layer; the number of neurons in this hidden layer is varies from 1 to 20.The network was trained 30 times for each structure.
Table (1) shows the ordered minimum mean of MSE for the 30 training trails for each number of neurons.In the second experiment, the network was trained with two hidden layers.In the first layer the numbers of neurons that achieve the best performance in the first experiment are used.For the second hidden layer the number of neurons is varied from 1 to 10.The network was trained 30 times for each possible pair of neurons.Table2shows the pair of number of neurons in the two layers that achieve the minimum MSE mean.
In the third experiment the network is trained with three hidden layers.The number of neurons in the first and second hidden layer is those pairs which give better performance in experiment two as listed above.The number of neurons in the third hidden layer is varied from 1 to 10.The network was trained 30 times for each possible triple set of neurons.Table (3) shows the number of neurons in the three layers that achieve the minimum MSE mean.
In a similar manner, in experiments 4 and 5 the network is trained using four layers and five layers respectively.Tables (4) and (5) show the numbers of neurons in different layers that achieve the minimum MSE mean.www.ijacsa.thesai.orgFrom table (5) above, the last row shows the average performance of the best 7 number of neurons using different number of layers.This row shows that the increase of the number of hidden layers from 1 to 3 leads to a gradual lowering to the performance of classification, then the performance is increased when the number of layers is 4 and 5. Thus the number of hidden layers has a significant effect on the classification performance.And since the best performance and the best performance average is attained when the number of layers is 5 this indicates that increasing in the number of hidden layers leads to better classification performance for Breast Cancer Data set.
All tables' show that the best performance is achieved when the number of neurons starting from layer two and up is small, typically 1 or 2.
So the final conclusion is that, to achieve better classification performance of Breast Cancer Dataset using a

TABLE I .
THE MINIMUM MEAN OF MSE IN THE 1ST EXPERIMENT

TABLE II .
THE MINIMUM MEAN OF MSE IIN THE 2ND EXPERIMENT

TABLE III .
THE MINIMUM MEAN OF MSE IN THE 3RD EXPERIMENT TABLE IV.THE MINIMUM MEAN OF MSE IN THE 4TH EXPERIMENT