Hybrid Texture based Classification of Breast Mammograms using Adaboost Classifier

Breast cancer is one of the most dangerous, leading and widespread cancers in the world especially in women. For breast analysis, digital mammography is the most suitable tool used to take mammograms for detection of cancer. It has been proved in the literature that if it can be detected at early and initial stages, then there are many chances to cure timely and efficiently. Therefore, initial screening of mammograms is the most important to detect cancer at initial stages. A radiologist is very expensive in the whole world wide and for a common person, it is very difficult to take opinion from more than one radiologist because it is a very sensitive disease. Thus, another solution is required that can be used as a second opinion to help the low cost solution to the patients. In this paper, a solution has been proposed to solve such type of problem to take mammograms and then detect cancer automatically in those images without any help of radiologist or medical specialist. So this solution can be adopted especially at the initial level. Proposed method first segment the portion of the image that contains these cancerous parts. After that, enhancement has been performed so that cancer can be clearly visible and identifiable. Texture features have been extracted to classify mammograms. An ensemble classifier AdaBoost has been used to classify those features by using the concept of intelligent experts. The standard dataset has been used for validation of the proposed method by using well-known quantitative measures. Proposed method has been compared with the existing method. Results show that proposed method has achieved 96.74% accuracy as well as 98.34% sensitivity. Keywords—Features; Segmentation; Breast mammograms; Classification; Texture


INTRODUCTION
Cancer is the most dangerous and leading cause of death in the whole world wide.There are different types of cancers in the different organs of a human.For women, breast is the most important organ.At the baby birth, mother women used his breast to feed her milk to her child.Therefore, breast is the most important organ especially for women.There is a special care required so that it can escape from any type of disease or cancer.Due to milk transfer to child's, there are chances that may be cancer also shifted to child's if it is uncured or due to unaware of such type of diseases [1].Breast cancer is the most common cancer especially in the women.Thus there is special attention required to solve this problem.Mammography is a process that can be used to detect cancer in the breast.Radiologists are the most expensive in the whole world wide.It is very difficult for a common person to bear too many expenses.Second this cancer is also diagnosed very carefully.
Most of the time, it is recommended to take the second opinion from another radiologist.Due to lack of funds or expenses, it is very difficult to take the second opinion.Now a day, in this digital world, it is possible to introduce a computer based solution to diagnose such type of cancers [2,3,4].In the literature, many different Computer Aided Diagnosis (CAD) systems are available to help the radiologist to take the second opinion.Most of the existing system has some problems due to poor imaging quality.Some systems did not perform well in the case of noises or due to low radiation may be image has low quality or poor quality due to low contrast.There is CAD system available that guarantees the solution.Still, there is room to improve the performance of these CAD systems [5,6].Therefore, I have tried to propose a solution to detect cancer in the breast mammograms.
In this paper, a new CAD system has been proposed by using different three types of steps.First breast part of the mammograms has been extracted by using a bilateral filter with logarithm transformation.This bilateral filter smooths the gray levels by preserving the edges.Log transformation has an advantage that it increases the dynamic range especially for those areas which are dark in the mammograms.Then entropy has been calculated so that thresholding can be applied to make it binary.Then seed point has been selected from the white area so that adaptive contour method can start.After extracting breast part, enhancement has been performed to improve the performance.Then features extraction has been performed to classify busing ensemble classifier.
The main contribution of the proposed methods is following:  Proposed method works well for low contrast images as well due to bilateral filter, log transformation and enhancement process.
 Adaptive contour method has been used by using the concept of entropy with active contour.
 Enhancement has been performed by using Partitioned Iterated Function System.
 The classification has been performed by ensemble classifier AdaBoost.

II. PROPOSED METHOD
Proposed method consists of different phases to complete the whole process.Figure 1 shows sample image from the www.ijacsa.thesai.orgdataset and it clearly shows that this mammogram image has many different parts inside it.There is some portion that is background, some portion shows muscles that are not part of the area where we have to find out cancer.Therefore, it is important to remove all these unwanted parts and segment the required part for further analysis.In the first phase, segmentation has been performed to extract the region of interest.In the second phase, there is also required to enhance the quality of the image so that it can be shown clearly visible and easily identifiable.Thus enhancement has been performed to improve the quality of mammograms.After that, features are required to classify those regions.Figure 1 shows that mammograms clearly show texture on the image.So Texture features have been extracted and later used for classification.We know that in our daily life different experts can give their opinion and finally conclusion has been designed by combining the opinions of all those experts.A similar concept has been used in this paper to classify mammograms.Ensemble classifier AdaBoost has been used that can combine the performance of different classifiers and finally decide the output by using the texture features.Details of all these phases has been given below in detail

A. Preprocessing
In this phase, segmentation has been performed to extract the portion of the region of interest that can be used later for features extraction and classification.For segmentation, adaptive active contour method has been used to segment automatically.In the literature, it many researchers has used active contour but the major problem with active contour is the seed point where it needs to start.Therefore, in this paper, I have modified the existing active contour that works based upon snake by using the concept of entropy.Entropy can be used to find the area where this active contour needs to start.Further, to improve the performance, the bilateral filter has been applied at the start so that edges can be preserved.Therefore, first images have been filtered by using bilateral filter [7] so that this filter performs smoothing on the images while it also preserves the edges.
After preserving the edges and making the image smoother, logarithm transformation [8] has been applied to the image as shown in Figure 2. The basic advantage of log transformation is to increase the dynamic range especially in those areas which are dark in the mammograms as shown in Figure 3.
Where Io is output image, c is constant, Iin is input image and σ is the scaling factor that controls the input range to the logarithmic function.So this log transformation can also improve the low contrast areas and regions available inside the images.After this step, active contour has to apply for the segmentation of breast part [9].For active contour, a seed point is required and this seed point can find out by using the concept of entropy.So entropy has been calculated and applied to the image as a threshold.After making threshold the image, a seed point can be selected that is the white area which shows the breast part of the image.After applying a threshold, the boundary of the breast is not accurate due to overlapping intensities of inside breast part and outer side.But at least a point can be used to start the active contour.
Active contour works on the base of the snake.The concept of active contour has been taken from [9].First energy is required to calculate that can be calculated by using an energy function shown in equation (3).
External energy shows the image properties like edges or noise in the image.It can be calculated by using following equation (5).
Thus to start the contour, the first initial point is selected from the white part after thresholding and then applied to the image that has been returned by applying bilateral filtered image so that actual breast part can be extracted from the original image.This active contour process returns the breast part only that can be used for enhancement and features extraction.

B. Texture Features using Gabor Filter
Gabor filter can be used to extract texture information.The texture shows a specific pattern and mammogram images has some specific pattern that represents a specific texture and pattern.Therefore, texture is the most suitable for features extraction in the case of mammograms.So the characteristics of texture can be represented by spatial frequencies and it can also be represented by their orientations.There are different types of Gabor filter that can be applied on images to extract texture features.But in mammograms 2-D Gabor filter is most suitable due to nature of images that are in 2-D form.Gabor filter is a Gaussian kernel function and that can be modulated by a sinusoidal wave of precise frequencies and orientation.To represent the 2-D Gabor filter, fowling equations can be used: Where variables x, and y are the spatial variables, σ x and σ y represent the scaling parameters of the filter, and W is the central frequency of the complex wave.Gabor filter bank is a combination of different Gabor filters applied at different scales, frequencies and orientation (Figure 4).It is possible to generate different filter banks with different orientation and scales.In this paper, Gabor filter bank has been created by applying two frequencies, two scales, and two orientations.For this purpose, following values has been used for generation of the filter bank.After calculating these filters, convolution is required to apply on the original images.So these eight filters are convolved with the original images so it returns eight new convolved images.After applying Gabor filter bank, there are www.ijacsa.thesai.orgmagnitude values of the Gabor transform.These magnitudes represent changes very slowly with displacement.After that some statistical information has been extracted from these Gabor filtered images.Mean, variance, skewness, kurtosis, entropy and energy have been calculated from all these filtered images and then make a vector for classification.intelligently to combine those classifiers to improve the performance of classification.One of the most important ensemble classifiers is AdaBoost that is also known as adaptive boosting.This AdaBoost was proposed by [14] and it improves the simple classifier by using the iterative procedure.In this iterative procedure, during each iteration, there is a process to improve the misclassified samples.This procedure increased weights of misclassified patterns and decreased the weights of correctly classified samples during each iteration.In this way, weak classifiers are given more preferences and these weak classifiers are forced to learn more by using difficult samples [14].In this way, classification performance improves during this iterative weight adjustment procedure.These adaptive weights can be used for the classification of new samples.In this way, algorithm supposed that the training set contains m samples and these samples are labelled as -1 and +1.In this way, classification of the new sample can find out by using voting for all classifiers Mt with weights αt.Mathematically, it can be written as: Pseudocode of the AdaBoost is given in Figure 5.

IV. RESULTS AND DISCUSSION
To test the performance of the proposed method, different quantitative measures have been used.Accuracy, sensitivity, specificity and Area under The Curve (AUC) have been used.These can be calculated by using mathematical equations shown in equations ( 5), ( 6) and (7).
Sensitivity can be calculated by using Specificity can be calculated by using Where TP is True positive, FP is false positive FN is false negative and TN is true negative I have performed three types of experiments by dividing the testing data into different ratios so that there should be no bias in training and testing.To overcome such type of problems, three different ratios like 40-60 mean 40% for training and 60% for testing, 50-50 mean 50% for training and 50% for testing and 60-40 mean 60% for training and 40% for testing has been used.We measure accuracy, sensitivity and specificity and by using these Area under the Curve (AUC) also calculated to show the performance of the proposed method.I have used different classifiers to test the performance to show that which classifier is best suitable for this problem (Figure 6).Results have been shown in Tables 2 and 3.These results show that by using proposed method with ensemble classifiers, it performs best in all cases.Support Vector Machine (SVM), K nearest neighbour (KNN), artificial Neural www.ijacsa.thesai.orgNetwork (ANN) and ensemble classifier has been compared by using the same features set.These results show that ensemble has the best accuracy, sensitivity, specificity as well as AUC.Tables 2 and 3 shows enhancement is better to improve performance.Therefore, to compare with existing methods, I used ensemble classifier by enhancement and also select the best ration that is 60-40 where 60% data used for training and 40% used for testing and results shown in Figure 6.After that I have compared with existing methods to test the performance of proposed method.Results have been shown in Table 2.These results show that proposed method shows best results as compared to all other existing methods in both the accuracy as well as the sensitivity.The main reason for the improved performance is good segmentation of the breast part, most suitable features extraction and ensemble classifier also plays an important role to increase the performance.Same performance measures are used as well as other parameters for classifiers.In this paper, I have proposed a computer aided diagnosis system that performs three different tasks.In the first task, breast segmentation has been performed by using a mixture of bilateral filter, log transformation, adaptive active contour and entropy.Then enhancement has been performed by using the concept of Partitioned Iterated Function System.At the end most suitable texture features has been extracted and classified by ensemble classifier that performs well as compare to other classifiers.Due to these contributions, proposed system performs well.In the future, I will try to use some other features for classification.Deep Learning is also well suited for this problem.So in the future, deep learning concept can be applied to test the performance.

Fig. 3 .
Fig. 3. Results of Log Transformation on mammogram images Where Eint = internal energy, Eimg = forces of the image, and Econ = External constraint forces.Internation energy can be calculated by using 1 st and 2 nd derivatives of the parametric curve equation and it calculates by using equation (4).

Fig. 4 .
Fig. 4. Gabor Filters with 2 orientations and 4 scales III.CLASSIFICATION USING ADABOOST Classification is the process to differentiate into classes by using some characteristics.In the literature, many different classifiers are available that can classify individually.Ensemble classification used different weak classifiers andintelligently to combine those classifiers to improve the performance of classification.One of the most important ensemble classifiers is AdaBoost that is also known as adaptive boosting.This AdaBoost was proposed by[14] and it improves the simple classifier by using the iterative procedure.In this iterative procedure, during each iteration, there is a process to improve the misclassified samples.This procedure increased weights of misclassified patterns and decreased the weights of correctly classified samples during each iteration.In this way, weak classifiers are given more preferences and these weak classifiers are forced to learn more by using difficult samples[14].In this way, classification performance improves during this iterative weight adjustment procedure.These adaptive weights can be used for the classification of new samples.In this way, algorithm supposed that the training set contains m samples and these samples are labelled as -1 and +1.In this way, classification of the new sample can find out by using voting for all classifiers Mt with weights αt.Mathematically, it can be written as:

TABLE I .
LIST OF ABBREVIATION USED IN THIS PAPER