Hybridized Machine Learning based Fractal Analysis Techniques for Breast Cancer Classification

The usefulness of Fractal Analysis (FA) is not limited to a particular area. It is applied in variety of fields and has shown its efficiency towards irregular objects. Fractal dimension is the best measure of the roughness for natural elements and hence, it can be treated as a feature of the natural object. Breast masses are irregular and divers from a malignant tumor to benign; hence breast can be treated as one of the best areas where fractal geometry can be applied. It gives a scope where fractal geometry concept can be used as a feature extraction technique in mammogram. On the other hand, the support vector machine is an emerging technique for classification. The survey shows that few works have done on breast mass classification using support vector machine. In our work two most effective techniques are used in separate operations, FA: Box Count Method (BCM) and Support Vector Machine (SVM) that result well in their fields. Feature extraction is done through Box Count Method. The extracted feature, “fractal dimension”, measures the complexity of the input data set of 42 images. For the next segment, the resulting Fractal Dimensions (FD) are processed under the support vector machine classifier to classify benign and malignant cells. The result analysis shows that the combination of SVM and FD yielded the highest with 98.13% accuracy. Keywords—Mammography; feature extraction; fractal dimension; box-counting method; classification; support vector machine


I. INTRODUCTION
Now a day"s most of the women are suffering from Breast cancer, which is the most valid cause of cancer-related death [1]. As women are not treated in an initial stage in urban areas or under growing cities, the survival rate became very less. Women"s breast consist of micro-calcifications, which are tiny calcium that deposited in the breast tissues; it makes a small bright spot in the mammogram. Micro-calcification is very small in size, so it is very difficult to detect (range of 0.05-1mm). Image processing is a tool that is widely spread in the field of medicine, mainly for the diagnose of diseases or disruption of the human body. The anatomical structure of breast cancer can be observed through its medical image which can be taken through high-quality imaging tools such as-X-Ray, Mammography, Thermography, Ultrasonograph, Medical Resonance Image.
Mammography and X-Ray are both treated as standard methods for breast cancer detection but, it is noticed that they are unable to detect the whole mass until it works on a specific size. Due to the high radiation, these methods are not preferable for below the age of 40 [2].
To overcome the above limitations a new tool is introduced known as "Thermograph". This tool explores the usefulness of non-ionizing, free-radiation, convenient and beneficial. It is useful for a routine check-up to increase the certainty of breast malignant identification. Due to the convenient and non-ionizing feature, it can work for the urban areas.  [3]. Thermograph technique is used to detect the temperature distribution over the entire surface of the breast. Due to the angiogenesis, the temperature of the skin over the tumor surface is more than the surrounding. It is easy to find the anomalous region with the help of an infrared camera [4][5].
Cancerous tumors exhibit arbitrariness related with their development & is ordinarily sporadic & complex fit as a fiddle. Thus FA can give a decent measure to their intricate examples than the conventional Euclidean geometry. A few PC supported strategies have been created to help specialists to improve the proficiency and precision of mammographic screening programs [6][7][8][9][10]. Different fractal-based methods have been utilized by researchers in different fields for fractal dimension estimation in natural objects such as cloud, trees, deserts. Nguyen. and Rangayyan [12] used fractal study for identification of abnormal sections in the mammogram, before testing in infrared camera. In 2010 Tavakal et al. [13] introduced the concept of FA for segmentation and division of breast thermography as Benign (range <=1) or Malignant (>1).
This paper is organized as follows: Section 1 provides the overall views; Section 2 is dedicated to the related works done in this area and Section 3 represents all the methodologies www.ijacsa.thesai.org which are used. Similarly, Section 4 describes experimental results. Finally, the work is concluded in Section 5.

II. RELATED WORK
Several researchers have introduced different methodologies for extracting features and classifying the mammogram images.
Heriana and Soesanti [15] proposed a method, which takes the image dataset, extracts the features through the fractal algorithm and after successful extraction of data, classification is done using C-means clustering algorithm. The result shows that 64x64 pixel box sizes are more consistent than 32 x 32 pixels. Several methods are developed such as image filtering and local threshold [16], stochastic fractal methods [17], wavelet analysis, fuzzy logic [18] for classification and segmentation. After few years Chang and Chen [22] were able to segment the tumor into malignant and benign type by calculating the Fractal dimension of ultrasound images. Rangayan and Shen [23] used Fourier transformation that detects the cancer affected area. They have used neural network classifier and got 89.21% accuracy in their preliminary analysis stage. Nooden, in 2010 with other researhers used probabilistic neural network for detection of cancerous zones in mammograms [24]. Moldovanu and Moraru [25] tried to highlight the connection between the fractal dimension of breast cancer and the knowledge which is extracted using the k-means algorithm [25]. In 2010, Patel & Sinha [19] developed a strategy for clinical image improvement, in light of the idea of fractal derivative & image processing strategies like segmentation of image with selfsimilar properties. The paper manages definite aftereffects of programmed recognition of breast cancer mass utilizing selfsimilar fractal-utilized segmentation. The [14] review shows that the concept of region of interest (ROI) is also one of the interesting techniques for cancereous area identification. After that Nam and Chai [20] used box-counting method to identify the area of established micro-calcification in mammograms and found the zones of cancerous micro-calcification cells. In 2001 also the fractal concept was used by Zheng and Chan to locate tumor cells on mammograms [11]. They have divided the image into the size of 16 x 16 boxes and determined the fractal dimension. They have noticed that the fractal dimensions of the portions,which contain cancer, lies within a certain boundary. Classification of galactograms using fractal properties was also done in 2006 [21]. Similarly in 2014 Netprasat with other authers [26] developed architectural distortion detection using SVM with an accuracy of 91.67%.
Recently Roy and Gogoi [28], presented the two most effective features i.e. fractal geometry and lacunarity on mammograms and thermo-grams. They proved that these features are giving better results than a texture feature. One of the fractal algorithms, the box-counting method was used by Zheng et al. and Rangayyan et al. to detect distorted sites through a mammogram. Similarly, Tavakol [27] introduced a method of fractal study for the fragmentation of breast cancer. Roy et al. [28] used the Hurst co-efficient and lacunarity features for dividing normal and abnormal states in a breast thermogram. He noticed that the value of lacunarity was greater in an abnormal state as compared to that of the normal state. Sankar & Thomas [29] proposed a methodology to distinguish benign and malignant tumors in breast mammogram. Early prediction of breast cancer is done by Machireddy and others in 2019 [30] to decide if multiresolution FA of voxel utilized dynamic contrastenhanced magnetic resonance imaging (DCE-MRI) parametric maps can give premature expectation of breast cancer reaction to neoadjuvant chemotherapy (NACT). Le Hoang Son et al. [31] focuses on the recent developement over investigates regarding machine learning for big data analytics & other strategies in response to advanced modern computing for different applications. Chatterjee [32], attempted to give a reasonable progressively huge appreciation about the IoT in big data structure near to its different issues, troubles & zeroed in on giving potential game plans by machine learning procedure. Chatterjee [33], talked about various issues relating to bioinformatic data assortments & make different proposition on the right usage of machine learning frameworks for bioinformatics explore. [34], efficiently applied fractal dimension to detect tumor. Not only in gray images, but also in different color model, the fractal consept can be applied this is proved in [35]. İt is also seen that the fractal geometry works properly in face shape classification [36].

A. Modified Relative Improved Differential Box Counting Method
The steps for this method are as follows: 1) The image of size M x M is divided into blocks of size l x l which cover the entire image surface with boxes of sides l x l x l".
Where l' = l x G/M. G = Number of Gray Intensity Levels.
2) The boxes are assigned with a scale of l x l xl' starting the pixel with minimum gray level in the block.
3) Number of boxes in each block is found out as , using Eq.
4) Total number of boxes needed to cover whole image is given as N r using Eq. 1

5)
Different values are calculated for a different scale l.
6) The log of both and scale is taken and a graph is plotted with log ( ) in y-axis and log(r) in x-axis.
7) The best fit line for all the plotted lines, is drawn and the slope of the best fit line is calculated which gives the FD of the image. www.ijacsa.thesai.org

B. Support Vector Machine (SVM) Classification
A SVM is a discriminative classifier, which is used for classification purposes. The main objective of the support vector machine is to design a hyperplane margin that classifies all the datasets into some classes respectively. For a flexible classification and visualization, it depends on a margin which must be at maximum distance from both the support vector lines of individual classes. The algorithmic steps are given below: 1. Start with taking input as fractal dimension which is extracted by using fractal geometry. 2. 42 data are structurally lying in a linear fashion before using SVM. 3. Now for classification purpose, we are taking a line, with two equidistant parallel lines to it. 4. Pick a large number (no of repetition or epochs). 5. Pick a number close to 1(the expanding factor (0.99)). 6. Pick a random point.
If point is correctly classified, then take it for classification process.
x +b (3) +b < 0 (comes under class "-1") +b > 0 (comes under class"+1") For x R 2 1 x 1 + 2 x 2 +b =0 (4) m = slope b = intercept 7. If the point is not classified move the line towards point split the lines using expanding factor 8. After finding the exact line, we can separate the dataset in to binary classes.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
The medical images are usually noisy and are not in acceptable stage for classification. So a pre-processing technique is used to upgrade the image quality. Fig. 1 shows the overal classification procedure where the fractal geometry technique, Modified Relative Improved Differential Box Counting Method is used for feature extraction. Finally SVM is applied to classify the dataset as Malignat or Benign.
The Breast Cancer dataset is retrieved from "mammoimage.org" and "visualsonline.cancer.gov". This dataset consists of 42 instances and the experiment shows that among 42 instances, 20 cases are benign and 22 are malignant cancer cells. The dataset is separated into two binary classes i.e. 0 and 1, where 0 identifies benign class and 1 identifies the malignant class.
MATLAB R2016a as application software, is used for extracting features of 42 images using fractal dimension estimation method and is displayed in Table I. For the testing purpose, we considered all gray images as input. In this study, the classification testing is done through two approaches and the simulation is done using Scikit-learn open-source framework. First, it is tested by a support vector machine which linearly classifies the dataset into two clusters/classes i.e. Malignant, Benign.The second classification is based on a support vector machine with kernel functions i.e. Polynomial, Radial basis function. In our work more emphasis is given to SVM Linear classifier for classification of the dataset and kernel function is taken for comparison of accuracy for both classifier. The classification is based on SVM as supervised learning and the results obtained shows that 20 images belong to Benign and 22 images come under malignant out of 42 images in a dataset.
If the data are linearly separable, it is easy to use linear support vector machine, but if the data are in high dimensional, it needs non-linear SVM like kernel function to reduce the cost of data. Fig. 1 and Fig. 2 shows the example of an origina image and its corresponding micro-calcification extraction, respectively (Fig. 3). The linear SVM classifier classifies the two classes based on the target values and it is displayed in Fig. 1. Two classes are separated by their individual support vector line and the linear hyperplane margin, which are situated at equal maximum distance between two classes of malignant and benign. In Fig. 6, upper dots identify malignant class and benign class is identified by the lower dots.
We have observed that linear SVM classifiers are efficient and work properly, but the datasets available, are not in a linear fashion. For the non-linear datasets the kernel based SVM classifier plays an important role. A kernel is a method for calculating the dot product in between two vectors and hence, kernel functions are generally called "generalized dot www.ijacsa.thesai.org product". In this paper, we have considered two functions-Radial basis function and Polynomial kernel function and compared the accuracy. The result of Radial basis function and Polynomial kernel are shown in Fig. 7 and Fig. 8, respectively.
The result shows that overall accuracy of classification results in linear classification is 98.13%. Similarly the accuracy of a polynomial function is 96.16% and RBF is 94.74%.

V. CONCLUSION
FA is one of the best ways to represent natural objects. Hence it is the most suitable method for the study and analysis of breast cancer. The analysis of the previous works shows that limited research is done on the classification of breast mass using a machine learning technique. We have hybridized two finest methods i.e. fractal geometry and machine learning to classify the malignant and benign from the breast mammogram images. In our proposed method, we have used the box count technique for the extraction of features (fractal dimension). After analyzing the fractal dimension of each image we set a threshold value that shows if FD is less than 1.5, they come under benign represented as target=0 and those having FD>1.5, comes under malignant class represented as target=1. The support vector machine is used for the classification of the calculated FDs. The procedure is implemented in python with a dataset having 42 images and we got the result with an accuracy of 98.13% in Linear SVM. In this procedure, we have trained 33 images and according to the training dataset 9 images are tested automatically. Kernel SVM is also implemented which provides less accuracy than the LSVM. The result analysis shows that the proposed procedure is somehow noisy for large datasets. So we can extend our workand aim to develop an more efficient technique for the large dataset with less noisy that would give better performance. In future we will try to use multiple mechine learning techniques to test the proposed method on a large database to achieve higher accuracy.