AUTOMATIC SKIN CANCER IMAGES CLASSIFICATION

Early detection of skin cancer has the potential to reduce mortality and morbidity. This paper presents two hybrid techniques for the classification of the skin images to predict it if exists. The proposed hybrid techniques consists of three stages, namely, feature extraction, dimensionality reduction, and classification. In the first stage, we have obtained the features related with images using discrete wavelet transformation. In the second stage, the features of skin images have been reduced using principle component analysis to the more essential features. In the classification stage, two classifiers based on supervised machine learning have been developed. The first classifier based on feed forward back-propagation artificial neural network and the second classifier based on k-nearest neighbor. The classifiers have been used to classify subjects as normal or abnormal skin cancer images. A classification with a success of 95% and 97.5% has been obtained by the two proposed classifiers and respectively. This result shows that the proposed hybrid techniques are robust and effective.


I. INTRODUCTION
Skin cancer is a major public health problem in the lightskinned population. Skin cancer is divided into non melanoma skin cancer (NMSC) and melanoma skin cancer (MSC) figure (1.1). Non melanoma skin cancer (MMSC) is the most prevalent cancer among light-skinned population. It is divided into basal cell carcinoma (BCC) (75%), squamous cell carcinoma (SCC) (24%), and other rare types (1%) such as sebaceous carcinoma [19]. The critical factor in assessment of patient prognosis in skin cancer is early diagnosis. More than 60,000 people in the United States were diagnosed with invasive melanoma in recent years, and more than 8000 Americans died of the disease. The single most promising strategy to cut acutely the mortality rate from melanoma is early detection. Attempts to improve the diagnostic accuracy of melanoma have spurred the development of innovative in-vivo imaging modalities, including total body photography, dermoscopy, automated diagnostic system and reflectance confocal microscopy. The use of computer technology in medical decision support is now widespread and pervasive across a wide range of medical area, such as cancer research, gastroenterology, hart diseases, brain tumors etc. [10], [12]. Recent work [19] has shown that skin cancer recognition from images is possible via supervised techniques such as artificial neural networks and fuzzy systems combined with feature extraction techniques. Other supervised classification techniques, such as k-nearest neighbors (k−NN ) also group pixels based on their similarities in each feature image [5], [8], [9] can be used to classify the normal/abnormal images. Therefore image processing become our choice for an early detection of the skin cancer, as it is non-expensive technique. The identification of the edges of an object in an image scene is an important aspect of the human visual system because it provides information on the basic topology of the object from which an interpretative match can be achieved. In other words, the segmentation of an image into a complex of edges is a useful prerequisite for object identification. However, although many low-level processing methods can be applied for this purpose, the problem is to decide which object boundary each pixel in an image falls within and which high level constraints are necessary. In this paper, supervised machine learning techniques, i.e., Feedforawd-Backpropagation Neural Network(FP−ANN ) used as a classifier. Also unsupervised classification techniques such as k− NN used as a classify images into normal/abnormal as will be discussed in the subsequent sections. Wavelet transform is an effective tool for feature extraction, because they allow analysis of images at various levels of resolution. This technique requires large storage and is computationally more expensive [12]. In order to reduce the feature vector dimension obtained from wavelet transformation and increase the discriminative power, the principal component analysis (PCA) has been used; PCA reduces the dimensionality of the data and therefore reduces the computational cost of analyzing new data. This paper is organized as follows: a short description of the images preprocessing section (II), details of the proposed hybrid techniques sections (III) and (IV), section (V) contains simulation and results and finally the conclusion.

II. SKIN CANCER IMAGES PREPROCESSING
There are many types of the skin cancer, each type has a different color, size and features. Many skin features may have impact on digital images like hair and color, and other impacts such as lightness, and type of the scanner or digital camera. In the preprocessing step, the border detection procedure namely, color space transformation, contrast enhancement, and artifact removal, treated as follow [2]: i) Color space transformation Dermoscopy images are commonly acquired using a digital camera with a dermoscope attachment. Due to the computational simplicity and convenience of scalar (single channel) processing, the resulting RGB (red-green-blue) color image is often converted to a scalar image using one of the following methods: • Retaining only the blue channel (lesions are often more prominent in this channel). • Applying the luminance transformation, i.e. Luminance = 0.299×Red +0.587×Green+0.114×Blue.
• Applying the Karhunen-Love (KL) transformation [18] and retaining the channel with the highest variance.
ii) Contrast enhancement Delgado et al. [6], proposed contrast enhancement method, based on independent histogram pursuit (IHP). This algorithm linearly transforms the original RGB image to a decorrelated color space in which the lesion and the background skin are maximally separated. Border detection is then performed on these contrast-enhanced images using a simple clustering algorithm. iii) Artifact removal Dermoscopy images often contain artifacts such as such as black frames, ink markings, rulers, air bubbles, as well as intrinsic cutaneous features that can affect border detection such as blood vessels, hairs, and skin lines. These artifacts and extraneous elements complicate the border detection procedure, which results in loss of accuracy as well as an increase in computational time. The most straightforward way to remove these artifacts is to smooth the image using a general purpose filter such as the Gaussian(GF), median(MF), or anisotropic diffusion filters(ADF).
Images are processed to have the following starting features: same size, color segmentations to remove hair if any as discussed in the next subsection.

A. Segmentation
Segmentation refers to the partitioning of an image into disjoint regions that are homogeneous with respect to a chosen property such as luminance, color, texture, etc. ( [3] and [20]). Segmentation methods can be roughly classified into the following categories: • Histogram thresholding: These methods involve the determination of one or more histogram threshold values that separate the objects from the background. • Clustering: These methods involve the partitioning of a color (feature) space into homogeneous regions using unsupervised clustering algorithms. • Edge-based: These methods involve the detection of edges between the regions using edge operators. • Region-based: These methods involve the grouping of pixels into homogeneous regions using region merging, region splitting, or both. • Morphological: These methods involve the detection of object contours from predetermined seeds using the watershed transform. • Model-based: These methods involve the modeling of images as random fields whose parameters are determined using various optimization procedures. • Active contours (snakes and their variants): These methods involve the detection of object contours using curve evolution techniques. • Soft computing: These methods involve the classification of pixels using soft-computing techniques including neural networks, fuzzy logic, and evolutionary computation.

III. THE PROPOSED HYBRID TECHNIQUES
The proposed hybrid techniques based on the following techniques, discrete wavelet transforms DWT, the principle components analysis PCA, FP-ANN, and k-NN. It consists of the following phases: feature extraction, feature reduction, and classification phase as illustrated in

A. DWT based feature extraction
The rationale of the approach proposed here implies two steps. In the first step, signals (tissue samples) are transformed into time-scale domain by DWT. In the second step features are extracted from the transformed data by a principle component analysis(PCA). The continuous wavelet transform of a signal x(t) is defined as the sum over all time of the signal multiplied by wavelets ψ ab (t) (see Eq. (2)), the scaled (stretched or compressed) and shifted versions of the mother wavelet ψ(t):

Image acquisition
Phase ( where ψ * (.) represents the conjugated transpose of the wavelet functionψ(.). The time-wavelet resolution depends on the scaling parameter a. For smaller c, ψ ab (t) has a narrow time-support and therefore a wider frequency support. When parameter a increases, the time support of ψ ab (t) increases as well and the frequency support becomes narrower. The translation parameter b determines the localization of ψ ab (t) in time. The DWT is defined taking discrete values of a and b. The full DWT for signal x(t) can be represented as where φ j0,k (t) and ψ j,k (t) are the flexing and parallel shift of the basic scaling function φ(t) and the mother wavelet function ψ(t), and µ j0,k (j < j 0 ) and ω j,k are the scaling coefficients and the wavelet coefficients, respectively. Generally, scales and positions are based on powers of 2, which is the dyadic DWT. Once a mother wavelet is selected, the wavelet transform can be used to decompose a signal according to scale, allowing separating the fine-scale behavior (detail) from the large-scale behavior (approximation) of the signal [1]. The relationship between scale and signal behavior is: low scale corresponding to compressed wavelet, and to rapidly changing details, namely high frequency; high scale corresponding to stretched wavelet, and to slowly changing coarse features, namely low frequency. Signal decomposition is typically done in an iterative fashion using the scales a = 2, 4, 8, · · · , 2 L , with successive approx-   imations being split in turn, so that one signal is broken down into many lower resolution components. In practice, signal decomposition can be implemented in a computationally efficient manner via the fast wavelet transform developed by Mallat [15], behind which the basic idea is to represent the wavelet basis as a set of high-pass and low-pass filters in a filter bank, as shown in figure (3.2). Figure (3.3) shows the three-level DWT of SSM image. The first level consists of LL, HL, LH, and HH coefficients. The HL coefficients correspond to high-pass in the horizontal direction and low-pass in the vertical direction. Thus, the HL coefficients follow horizontal edges more than vertical edges. The LH coefficients follow vertical edges because they correspond to high-pass in the vertical direction and low-pass in the horizontal direction. As a result, there are 4 sub-band (LL, LH, HH, HL) images at each scale. The sub-band LL is used for the next 2D DWT. The LL subband can be regarded as the approximation component of the image, while the LH, HL, and HH subbands can be regarded as the detailed components of the image. As the level of the decomposition increased, we obtained more compact yet coarser approximation components. Thus, wavelets provide a simple hierarchical framework for interpreting the image information. In our algorithm, level-3 decomposition via Haar wavelet was utilized to extract features.

B. PCA based feature reduction
The Principal Component Analysis (PCA) is one of the most successful techniques that have been used in image  recognition and compression. The purpose of PCA is to reduce the large dimensionality of the data space (observed variables) to the smaller intrinsic dimensionality of feature space (independent variables), which are needed to describe the data economically. This is the case when there is a strong correlation between observed variables. The jobs which PCA can do are prediction, redundancy removal, feature extraction, data compression, etc. Because PCA is a classical technique which can do something in the linear domain, applications having linear models are suitable, such as signal processing, image processing, system and control theory, communications, etc. Given a set of data, PCA finds the linear lower-dimensional representation of the data such that the variance of the reconstructed data is preserved ( [13] and [14]). Using a system of feature reduction based on a combined PCA on the feature vectors that calculated from the wavelets limiting the feature vectors to the component selected by the PCA should lead to an efficient classification algorithm utilizing supervised approach. So, the main idea behind using PCA in our approach is to reduce the dimensionality of the wavelet coefficients. This leads to more efficient and accurate classifier. In figure(3.4) given below an outline of PCA-algorithm.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 4, No. 3, 2013 290 | P a g e www.ijacsa.thesai.org Input: Data set(matrix) X m×n , (m ≡ # parameters and n ≡ # samples). Output: Y = P X, P ≡ transformation matrix from X to Y ; basis for X, s.t., the covariance matrix in this base is diagonalized.
Step 1: Calculate the covariance matrix in the new basis Step 2: Diagonalize the cov. matrix, i.e., A = EDE T .
Step 3: Find the eigenvalues and eigenvectors from the previous step; A = EDE T , where D = diagonal matrix contains the eigenvalues of A, E = matrix contains the eigenvectors of A.
Step 4: Dimension reduction: • rearrange the eigenvectors and eigenvalues, i.e., sort the columns of the eigenvector matrix E and eigenvalue matrix D in order of decreasing eigenvalue. • select a subset of the eigenvectors with higher values as basis vectors, project the data onto the new basis; of lower dimensions. Therefore, the feature extraction process was carried out through two steps: firstly the wavelet coefficients were extracted by the DWT and then the essential coefficients have been selected by the PCA (see Fig. 3.5).

IV. SUPERVISED LEARNING CLASSIFICATION
Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. In this paper two supervised learning algorithms are employed for classification, the first is k-Nearest neighbors and Artificial neural network, a details of both will be given below.

A. k-Nearest neighbors(k−NN ):
One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm. k−NN is based on the principle that the instances within a dataset will generally exist in close proximity to other instances that have similar properties [4]. If the instances are tagged with a classification label, then the value of the label of an unclassified instance can be determined by observing the class of its nearest neighbors. The k−NN locates the k nearest instances to the query instance and determines its class by identifying the single most frequent class label. Figure (4.1) shows a description of the algorithm. The training phase for k−NN consists of simply storing all known instances and their class labels. A tabular representation can be used, or a specialized structure such as a kd-tree. If we want to tune the value of k and/or perform feature selection, n-fold cross-validation can be used on the training dataset. The testing phase for a new instance t, given a known set I is as follows: 1) Compute the distance between t and each instance in I.
2) Sort the distances in increasing numerical order and pick the first k elements. 3) Compute and return the most frequent class in the k nearest neighbors, optionally weighting each instance's class by the inverse of its distance to t.

B. Artificial Neural Network (ANN):
An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase [21]. The feedforward neural network was the first and arguably simplest type of artificial neural network devised. In this network, the (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 4, No. 3, 2013 291 | P a g e www.ijacsa.thesai.org information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes figure (4.2). There are no cycles or loops in the network. The neural network which was employed as the classifier required in this study had three layers, as shown in figure (4.2). The first layer consists of eight input elements in accordance with the size of the feature vectors that selected from the wavelet coefficients by the PCA. The number of neurons in the hidden layer was five. The single neuron in the output layer was used to represent normal and abnormal tissue image (see Fig. 4.2). The most frequently used training algorithm in classification problems is the back-propagation (BP) algorithm, which is used in this paper too. The details of back-propagation (BP) algorithm are well documented in the literature [7]. The neural network has been trained to adjust the connection weights and biases in order to produce the desired mapping. At the training stage, the feature vectors are applied as input to the network and the network adjusts its variable parameters, the weights and biases, to capture the relationship between the input patterns and outputs [7]. The performance is measured by mean square error(MSE): here y t is the target value, y o is the actual output, and m is the number of training data, figure(4.3) shows ANN Back-propagation algorithm.

V. SIMULATION:
In this section the proposed hybrid techniques are used for the classification of malignant melanoma. The data set consists of total 40 images (20 normal and 20 abnormal) images downloaded from [11]. Figure (5.1 ) shows some samples from the used data. The proposed approach was implemented in Matlab 7.12 [16], on a PC with the following configurations, processor 2GHz; 2 GB of ram; run under MSWin. 7 operating system. Figure (6.3) shows GUI of the implementation. The Initialize the weights in the network (often randomly).
Do For each example e in the training set: a) O = neural-net-output(network, e) ; forward pass, b) T = teacher output for e, c) Calculate error (T − O) at the output units, d) Compute δ wh for all weights from hidden layer to output layer ; backward pass, e) Compute δ wi for all weights from input layer to hidden layer ; backward pass continued, f) Update the weights in the network.
Until all examples classified correctly or stopping criterion satisfied. Return the network. images resized to smaller size of pixels 8 × 8, Haar wavelet with the third level used, and the feature vector reduced to just eight components. With those settings, the hybrid classifiers; FP-ANN and KNN were run and an instance of the output shown in Fig. (6.3).

A. Performance Evaluation
In this section, the performance of the proposed techniques are evaluated for the skin cancer images. The proposed techniques performance evaluated in terms of confusion matrix, sensitivity, specificity, and accuracy. The three terms defined as follow [17] Spec. = T N (T N + F P ) c) Accuracy: the probability that the diagnostic test is performed correctly.

VI. CONCLUSION
In this paper, an automated medical decision support system for skin cancer developed with normal and abnormal classes. First the discrete wavelet transformation were applied on the images to get the feature vectors, as the dimensionality of the vectors quite large, one needed to reduce it through the principle component analysis. The resulting feature vectors have a few components; means, less time and memory requirements. Afterwards, those vectors were used for classification either with feed-forward neural network or k-nearest neighbor algorithm. The results of the deployed techniques were promising as, we got 100% for sensitivity, 95% for specificity , and 97.5% for accuracy.