A New Approach for Leukemia Identification based on Cepstral Analysis and Wavelet Transform

—This paper implements a new leukemia identification method which depends on Mel frequency cepstral coefficient (MFCC) feature extraction and wavelet transform. Leukemia identification is a measurement of blood cell features for detecting the blood cancer of a patient. Blood cell feature extraction is based on transforming the blood cell two dimensional (2D) image into one dimensional (1D) signal and thereafter extracting MFCCs from such signal. Furthermore, discrete wavelet transform (DWT) of the 1D blood cell signals are used for extracting extra MFCCs features to assist the identification procedure. In addition, Wavelet transform with denoising is used to reduce noise and increase classification accuracy. Feature matching/classification of the blood cell to be a normal cell or leukemia cell is performed in the proposed method using five different classifiers. Experimental results of leukemia identification method show that the proposed method is very good with wavelet transform and robust in the presence of noise.


INTRODUCTION
The probability of recovery of acute lymphocytic leukemia patient can be increased by the early identification of its symptoms.Leukemia (cancer) is a malignant disease seen in people of any age groups either in children or adults but usually affects people in their 50s and 60s.
In the literature, there is a lot of work for leukemia recognition based on many approaches like gene expression analysis [1] and holographic microscope images [2].
Artificial intelligent methods are based on automated systems that can speed up identification and make it much easier, in addition, the amount of data analyzed are higher moreover increase the classification accuracy specially in telemedicine applications.Many prediction methods used for analysis and classification of leukemia like KNN algorithm [3], other prediction methods use endoscopic images technique [4] and image processing techniques [5].This paper presents a fully automatic method for leukemia identification that classifies blood images to know if the blood cell is normal or leukemia (cancer) cell.
Leukemia images identification is very important for diagnosis and therapy of cancer patients.
The proposed leukemia identification method is based on transforming blood cell image into 1D signals and executing the same processes performed on speech signals.The speech signal operation used the Mel-frequency cepstral coefficients (MFCCs) for feature extraction fused with discrete wavelet transform (DWT) and signals denoising.
MFCCs are applied in speech recognition methods and its values are not very powerful in existence of additive noise [6], and so we propose leukemia identification system as an application of this idea by transforming the leukemia image 2D object into 1D signals and executing the same processes performed on speech signals.
Lately, wavelet based features are employed in various types of research.The DWT has good representation for time and frequency and can be used for multi dimensional localized contribution in time and frequency dominion for the signal of interest.moreover, wavelet denoising is a technique that can be used for reduction of noise from the speech signal [7], [8].In our algorithm, we combine wavelet transform of the image with features extracted from MFCCs to assist in achieving a better recognition rate.The rest of the paper is ordered as follows: An overview of leukemia, problem statement, and a brief survey of the current research area in this field are explained in Section 2. The process of extracting features existing in leukemia image using MFCCs is discussed in Section 3. The proposed leukemia recognition system, discrete wavelet transform, wavelet denoising are summarized in Section 4, feature matching (classification) is discussed in Section 5, Section 6, explains the experimental results and discussion.Finally, Section 7 summarizes the concluding remarks and future work.

II. RECOGNITION SYSTEM
The process of identification of the leukemia recognition system consists of four phases: feature extraction, a training phase followed by a testing phase, and classification [9], [10].Only useful information of the object is kept in feature extraction process.One of the most familiar methods that are used for feature extraction is MFCC [11].MFCC perform with frames of the data so it uses 1D speech or voice signals.
Training and testing steps both of them contain feature extraction technique.In the training step, each signal is showed www.ijacsa.thesai.orgusing a set of training data.Features are only the characteristic information of the signal and unnecessary info is stripping away.While in the testing step, feature extraction is also used and the resulting info is compared to the models in the database of the leukemia images to allow the unknown image to be identified.Finally, the classification process is performed to locate the exact signal corresponding to the leukemia image or normal blood cell image, therefore the system model is built and feature matching for testing the effectiveness of this model is performed by implementing a set of testing data to be compared with the stored features in the database.In classification phase, each unknown image is shaped by using a set of data samples in the training step, where a set of feature vectors is produced and kept in a database by deleting all needless information in the training samples keeping only the distinctive information to construct image models.When some unknown leukemia sample arrives, a mapping is made by the pattern matching techniques to match the features of the unknown sample to identify leukemia class [9], [12].
Feature extraction is a very vital step for recognition of unknown images.The only helpful data is selected in feature extraction that describes the signal and undesirable data is excluded.MFCC is a famed and excellent method for feature extraction from a speech signal [13], [14] that can be also used for face, gesture, palm print, satellite and iris image identification [15]- [19].
Thus this method is one of the finest techniques for feature extraction, mainly for automatic speech and speaker recognition system.

III. GENERATION OF COEFFICIENTS USING MFCC
MFCC technique selects features from a given image.In case of voice identification, we obtain features by the following steps: first, the one dimensional signal is separated into minor frames or segments to make its statistical specification fixed, then frequencies are suppressed by windowing at boundaries and increase its center frequencies, the signal is transformed to frequency domain by FFT, Mel-Scale determine the space and size between each filter, then log will normalize signal after making discrete cosine transform (DCT), finally we get MFCC factors which are the last step in feature extraction process which are the characteristic information of the image.Now matching is made between MFCC coefficients of the given sample and the dataset sample to recognize and validate the blood cell if it is normal or a leukemia cell.At the beginning, the 2D image should be converted to a 1D signal, and then fed to MFCC algorithm to extract features as done in a voice signal.
Leukemia identification mainly involves two phases: the first phase is to extract the features from the leukemia image sample and collect a dataset, this is known as training step and the second phase is to extract features from a testing sample and match them with the samples present in the database, this is known as a testing phase.Feature extraction and conversion are common steps in both training and testing phases of leukemia recognition system.
Feature extraction is the method of keeping image discriminative information while decreasing the amount of data present in the input image sample.This method is important to identify leukemia image from normal blood cell image by producing enough information for good leukemia recognition.There are many feature extraction techniques can be used in signal recognition system like linear prediction coefficients (LPC), linear predictive cepstral coefficients (LPCC), perceptual linear predictive analysis (PLP), and Mel-Frequency spectrum coefficients (MFCC).MFCCs is the famous one and it is used in this research.MFCCs coefficients that have been used to represent the signal distribution, moreover its features come from cepstral analysis and warped to the Mel-scale which assures low frequency components over the high frequency components.
Voice recognition steps are the same as leukemia recognition, however first the leukemia 2D image should be converted to 1D signal since MFCC works on a 1D signal.The following Fig. 1 shows the steps from 2D image to MFCC coefficients.

A. Input leukemia Image
We apply MFCC technique for leukemia recognition, it is the same steps used in voice recognition, the difference is the conversion of blood cell image from the 2D image to 1D signal then appling MFCC technique, the rest of the steps are the same.

B. Image to signal conversion
The 1D image signal from the previous step is framed and windowed using Hamming window technique then applying Fast Fourier Transform, the resulting magnitude of the FFT spectrum is warped by a series of Mel-filter banks according to the Mel scale.The next step is taking the log of the spectrum, followed by applying a discrete cosine transform [11].
Mel is the measuring unit of the perceptual scale of perceived pitch or frequency of a tune, so the Mel-scale is a conversion between the real frequency scale in Hz and the perceived frequency scale in Mels.Mel means Melody to show that the scale is based on pitch comparisons.So, the conversion is virtually linear below 1 kHz and logarithmic above.This is the formula for converting actual frequency  hertz to the  mel-scale frequency (1):

C. Windowing
Usually a Hamming or Hanning window is used.In this procedure, every frame is multiplied by a tapering function, after windowing the signal the output is: Where, () represents the output signal, () is the input signal acquired from framing,   is the number of samples www.ijacsa.thesai.orgwithin every frame and   () is hamming window symbolized as:

D. Fast Fourier Transform (FFT)
Fourier Transformation is performed on the sliced signal.FFT is used to map a signal from time domain to frequency domain [20].  samples in each frame are converted to the frequency domain.FFT is a fast processing algorithm to apply and has easy computational speed.FFT transformation is made for each frame separately when the signal is divided into small frames.

E. Mel Scale
The previously calculated spectrums are converted on the Mel scale to know the estimate about the existing energy at each position in the spectrum.Mel scale with the triangular overlapping window is recognized as a triangular filter bank.This filter bank is an array of different band pass filters with a spacing of the preset stationary bandwidth along Mel frequency time.
Thus, the Mel scale controls the space of the given filter and calculates the width of it, when the frequency gets higher filters also get wider.The appropriate spread filters give us the energy present in the signal at each point.The conversion formula of frequency is (1).
We apply Log of base 10 to the output spectrum from Mel bank then applying DCT for standardization, this step is important for the DCT calculation to make small value large enough and large small enough.

F. Discrete Cosine Transform (DCT)
DCT is performed on the log Mel spectrum to convert it to the time domain.The result coefficients are named Mel frequency cepstrum coefficients (MFCCs).
Where,  = 0,1, ⋯ ,  − 1,  is the number of filters,   is the MFCC and  is the number coefficients here  = 13 so, the total number of coefficients obtained from each frame is 13.

G. MFCC Coefficients
The output amplitudes of the spectrum are the MFCCs.As a summary of leukemia recognition steps: the image is mapped from 2D to a 1D image.Therefore The 1D signal image is split into small frames, then windowing is applied to suppress edges at high frequencies.The signal is transformed to the frequency domain by applying FFT technique.Furthermore, the size and space between filters are provided by Mel-scale, and then the signal is filtered by Mel-Bank.Accordingly, the output of Mel-Bank is logged then DCT is applied to get final Mel-Frequency Cepstral Coefficients.The proposed leukemia recognition system consists of signal modeling, feature extraction, and feature matching (classification) as shown in Fig. 2.

IV. THE PROPOSED LEUKEMIA RECOGNITION SYSTEM
For training and testing phase the image 2D is mapped to a 1D signal, MFCC features are obtained from 1D signal while discrete wavelet transformed get additional features which support the MFCC features from the original signal, furthermore, wavelet denoising is applied to signal and DWT signal to get additional features to assist MFCC features.In classification process, the unknown image features are used to predict the signal that corresponds to the class of this unknown image which is leukemia image or normal blood image.

A. Discrete Wavelet Transform (DWT)
It is a mathematical tool that hierarchically decomposing functions, furthermore any type of function like a curve, image, surface, or signal.Wavelet is a good technique that representing details of this function.DWT is a good method for the analysis of non-stationary signals.It represents the signal as a series of approximation at a different resolution where the low pass part corresponds to the signal while the high pass part corresponds to the details.It is the same as filtering the signal with a bank of bandpass filters whose impulse responses are all nearly given by scaled forms of a mother wavelet.The filters outputs are generally extremely reduced so the number of output samples of DWT equivalent to the number of the input samples, therefore no repetition arises in this transform.
The output features of this DWT vector are added to the features of the MFCCs generated from the original blood cell www.ijacsa.thesai.orgsignal to form a big feature vector that can be used for leukemia identification.These are more strong features in case of the existence of degradations.

B. Wavelet Based Denoising
Wavelet based denoising is improving the noise robustness [21].Wavelet denoising steps are the following: the first step is the decomposition by choosing the correct filter and applying the wavelet transform to the noisy signal to create the noisy wavelet coefficients till properly distinguish the occurrence of the perfect decomposition.The second is the vital step in wavelet denoising where we select suitable threshold border at each level and threshold technique like soft or hard thresholding to best eliminate the noises.Finally, the reconstruction step where the calculation is made for the thresholded wavelet coefficients to get the inverse wavelet transform to acquire a denoised signal.

V. CLASSIFICATION METHODS
There are many classification methods that can be used to distinguish normal blood cell from leukemia.In this paper, we used five different classification techniques: 1) Neighborhood Components Analysis: This algorithm attempts to exploit a stochastic of nearest neighbors of the leave-one-out KNN score on the training set [22], its breviation is NCA.
2) Support Vector Machines [23]: Radial SVM are used where 5 fold validation method are used to set the values of  and  within the range of (0.5-1.5).The implementation is made by the LIBSVM software.
3) Bayes classifier [24]: Using densities approximation of the class-condition according to the Gaussian density as a kernel function.

4) Naive Bayes kernel classifier (NBK):
This is mainly a naive Bayes classifier, where the one dimensional densities are approximated using a Parzen window density estimate, in place of Gaussian approximation.

5) Discriminant Analysis (LDQ):
Linear Discriminant Analysis can only learn linear decision boundaries, while Quadratic Discriminant Analysis can learn quadratic decision boundaries and is, therefore, more flexible.

A. Experimental Results
In this experiment, we need to preprocess images to unify the background of images before converting it to signal.The original colored image is converted to gray scale image (green channel), then green channel histogram is obtained.The next step is creating a binary image via thresholding to a specific value, and removing small objects from the binary image to get a clean binary image.The final step in preprocessing is masking the RGB image with a white background as shown in Fig. 3.
Total numbers of images are 210, classified as 107 normal blood cell images and 103 leukemia cell images.Normal blood cell images with its Mel-frequency cepstral coefficient are shown in Fig. 4 while leukemia cell images with its Melfrequency cepstral coefficient are shown in Fig. 5.
In the previous Fig. 4 and 5, the X-axis is the number of frames (MFCCs) which comes from input signal while the Y axis is the feature vector values for each frame.After this step, the classification techniques are applied to calculate best accuracies.

B. Results and Discussion
In this paper, we used six techniques for extracting features.In the first technique, MFCCs are extracted from the blood cell signals only.In the second one, the features are extracted from the MFCCs of the DWT of the blood cell signals.In the third technique, denoising process is applied to signal and features are extracted from denoised MFCCs technique of the blood cell signals.In the fourth technique, the features are extracted from the MFCCs of the denoised DWT technique of the blood cell signals.In the fifth technique, denoising process is applied and features are extracted from both the denoised signals and DWT of the denoised signals and concatenate these features in a single feature vector.In the sixth technique, denoising process is applied to MFCCs signal only so the features are extracted from denoised MFCCs signals and DWT of the blood cell signals and concatenate these features in a single feature vector.A comparison between the six experiments is given in Fig. 6.Fig. 6 illustrates that MFCC features extraction of the DWT of the blood cell signal (the second method) and features extracted from the MFCC of the denoised DWT of the blood cell signal (the fourth method) have the equivalent recognition rate and they both achieved the finest recognition rates in the existence of noise.This result shows the strength of the wavelet features to facilitate the recognition of the leukemia images and normal blood images with and without noise.We focus on these two methods (the second and the forth) in next experiments and exclude other methods.Receiver Output Characteristic (ROC) curves are shown in Fig. 7 and 9 for the five classification techniques used for the 2 nd method and 4 th method, respectively.ROC curve represents the performance curve for classifier output and as we see the NBayes kernel has the highest (area under the curve) in both Fig. 7 and 9 for features from the MFCCs of the DWT signal (2 nd method) and features from the MFCCs of the denoised DWT signal (4 th method), respectively.The detailed values of ROC curves for the previous two methods are shown in Table 1.
Classification accuracies are shown in Fig. 8 and 10 for the five classification techniques for the 2 nd method and 4 th method, respectively.The NBayes kernel has the highest www.ijacsa.thesai.orgidentification accuracy of 92.85% of the features from the MFCCs of the DWT signal (2 nd method).
The detailed values of classification accuracies for the previous two methods are shown in Table 1.
In summary, Fig. 11 shows a comparison of the classification accuracies between the five classification techniques used (NBayes kernel, NBayes, LDA Quadratic, SVM Radial, and NCA) for the 2 nd method and 4 th method.The highest accuracy classifier is Naïve Bayes kernel and the second highest accuracy is equally between NBayes and LDA Quadratic while the worst one is NCA.The experiments illustrate that the best results are accomplished by extracting features from the MFCCs of the DWT signal with and without the presence of noise.The experimental results have shown that the recommended technique is beneficial for classification of blood cell images to leukemia or normal blood cell and it is a novel application for MFCC method because it is mainly used for speech or voice recognition.
The best classification technique of blood cell images to be normal or leukemia cells are Naïve Bayes kernel Classifier.
In future work, the experiment will be done on a larger dataset, and the MFCC will be combined by different classification techniques to increase the accuracy of the leukemia image recognition rate.

Five
various classification techniques are used as classifiers in the proposed algorithm for the leukemia recognition method.Classification techniques are Radial SVM, Neighborhood Component Analysis classifier (NCA), Naive Bayes classifier, NaiveBayes Kernel classifier, and Quadratic Linear Discriminant Analysis (LDQ) classifier.

Fig. 6 .
Fig. 6.Recognition rate to SNR for blood cell images.

Fig. 11 .
Fig. 11.Comparison of classification accuracies for the five classifiers.

TABLE I .
RESULT OF CLASSIFICATION ACCURACY AND AREA UNDER THE CURVE VII.CONCLUSION AND FUTURE WORK This paper has given a strong identification method for leukemia identification images based on MFCC and the wavelet transform techniques.Five different classification techniques are used to know the best classifier for identification.