Classification of Multiple Sclerosis Disease using Cumulative Histogram

Multiple sclerosis (MS) is a chronic disease that affects different body parts including the brain. Detection and classification of MS brain lesions is of immense importance to physicians for the administration of appropriate treatment. Thus, this study investigates an automated framework for the diagnoses and classification of MS lesions in brain using magnetic resonance imaging (MRI). First, the MRI images format converted from dicom images of each patient into TIF format as MS lesion appears in white matter (WM) obviously. This is followed by a brain tissue segmentation using a k-nearest neighbor classifier. Then, cumulative empirical distributions or cumulative histograms (CH) of the segmented lesions are estimated along with other texture/statistical features that work on the difference between the intensity of MS lesions and its surrounding tissues. Finally, these CDFs are fused with and the statistical features for the classification of MS using K mean classifiers. Experiments are conducted, using transverse T2weighted MR brain scans from 20 patients that are highly sensitive in detecting MS plaques, with gold standard classification obtained by an experienced MS. By comparing the evaluated performance with statistical features, our proposed fusion scored the highest accuracy with 98% and a false-positive rate of 1%. Keywords—Cumulative Histogram (CH); Magnetic Resonance Image (MRI); Multiple Sclerosis (MS); White Matter (WM)


I. INTRODUCTION
Multiple sclerosis (MS) is an autoimmune inflammatory chronic disease of the central nervous that appears in the white matter (WM) [1] [2]. MS is an illness that can influence on the optic nerves in your eyes, brain and spinal cord. It is the reasons for defect in balance, muscle control, vision and other basic body functions [3]. In the detection of MS lesions Magnetic resonance imaging (MRI) became the most precise scanning. The accurate manual evaluation of each lesion in MR image would be a difficult, challenging task, subjective and low reproductivity. Automatic segmentation offers an attractive alternative to manual segmentation, a process that still takes up time with intra-and inter-expert variability. However, the progression of the MS lesions shows considerable variability and MS lesions present temporal changes in shape, location, and area between patients and even for the same patient. The research work included in this thesis aims at creating a robust technique for the automatic segmentation of MR brain damage. This makes the automatic segmentation of MS lesions a challenging problem, so in [4] they focused on differentiating between active MS and cold-spot lesion from brain MRI. MRI is a cornerstone in current diagnosis standard by enabling to show the distribution of WM lesions in space and time at high specificity and sensitivity [5]. The challenge was in identification of MS in MR Images since the lesions have different size, shape and also different locations with anatomical variability [6].
Based on a study presented by [7], techniques can be divided into deformation-based method and intensity-based method. In the intensity-based technique, pixels will be compared in different successive scans. While in deformationbased approaches, the non-rigid registration between successive scans will obtain the deformation features, It provides a discrete local displacement field that defines the deformation occurring between two images.
Texture feature analysis considered as an alternative quantitative method to classify contrast enhanced Multi sclerosis plaques [8], classify several sub-areas within lesions undergoing 'active' demyelination [9], differentiate white matter disease plaques from cerebral microangiopathies [10], and differentiate between relapsing and remitting Multi sclerosis plaques [11]. The common texture analysis approaches are statistical and Spectral approaches in classifying MS. Statistical texture analysis used statistical parameters to characterize texture features of the image. It is divided into first order (standard deviation, variance, mean, entropy, CH, skewness, ...) that provides a general evaluation of pixel distribution and is relatively clear, second order like runlength matrix (RLM) and Gray-level co-occurrence matrix (GLCM). While spectral analysis is the analysis of pixels pattern that make a unique texture and frequency distribution of an image. This approach has Fourier transform, Wavelet transform and Stockwell transform [12].
Texture is the number of operations performed to specifics spatial variations of gray-level pixels in MRI. Texture analysis has the possibility to support early detection of MS as it detects the slight difference in tissues. In our study we used texture analysis of MS lesion and normal tissue then we combined a group of texture feature to differentiate between tissues.

II. MATERIAL
This study used MRI data from 20 patients (five men and fifteen women) with mean age of 31±15 years. The data was collected from ELMOGY and ELRAKHAWY Radiology Medical Centers using gradient-echo T2 imaging using a 3T MRI scanner (Signa Explorer MR) with a phased-array torso surface coil. Approximately 70 axial slices were collected from each patient at slice thickness: 5 mm.

III. RESEARCH METHOD
Our framework is represented by a flowchart as shown in Fig. 1. The proposed analysis pipeline is considered as supervised approach that begins with registration of brain MR images. This is followed by a set of pre-processing steps using a bank of filters to remove the skull. Then a segmentation process is utilized using k-nearest neighbours (KNN) algorithm to segment the brain into three tissue parts: WM, the grey matter (GM) and MS lesion for each slice. Finally, a classification step is performed using a cumulative histogram (CH) feature to differentiate between MS and other tissues. Evaluation is performed using performance parameters like accuracy, sensitivity and specificity.

A. Preprocessing
The segmentation process is not easy to apply on MR images as images have a changing parameters, partial voluming, noise, interfering intensities, blurred edges, motion, echoes, gradients normal anatomical variations and susceptibility artifacts [13,14]. In the image we need to neglect all the unwanted parts that have same gray scale, we can get rid of noise as it appears mostly outside WM of the brain while the lesion is in this region [15]. As it shown in Fig.  2, the preprocessing stage is divided into two steps. First, remove artifacts from images that lead to inaccurate segmentation. However, from the image processing viewpoint, it is common to simplify all these problems [16]. Second, skull stripping is another important pre-processing step since fat, skull, skin and other non-brain tissues may cause misclassifications [17].

B. Segmentation using Knearst Neighbors
K-nearest neighbor segmentation [18] [19][20] is a statistical pattern recognition technique, that works on distinguishing between different samples (e.g. WM,GM, and CSF) by comparing resemble values in a defined feature space with values of samples in a learning set. A new pixel is classified by comparing the KNN samples to the nearest pixels according to a closeness measure, usually the Euclidian distance [21]. Commonly, the class that repeated mostly in the K-learning samples is assigned to this pixel. As it is shown in Fig. 3, KNN segment the MR image into WM, GM and output that will be classified using statistical features.

C. Texture Analysis
In this study we extract texture features from all normal WM regions and MS plaques. Textural analysis has two categories that depending on matrix or vector used in features calculation. In our proposed framework, we utilized CH-based features and compare the output accuracy with other features as statistics features. We have tested different statistical classification techniques in classifying the texture features of MS plaques and normal tissues and the best performance techniques have been selected.
For calculating all statistical features, we have expressed a given image of size m × n as a function g(a, b) with two quantities x and y, where a = 0,1, … , m − 1 and b = 0,1, … , n − 1 . The function g(a, b) can be any amount j = 0,1, … , Z − 1 . In the image, Z will be the total number of intensity levels. According to histogram-based features G(j) is the intensity level histogram, so the number of pixels in whole image have this intensity value.
Histogram is presenting simply the image as statistical data. The approximate probability density of the existence of the intensity levels P(j) is equals to the histogram G(j) dividing by the total number of pixels in the image. The normalized histogram is obtained for the following texture features: • Median: is the middle value of the normalized histogram vector that have even number of i, in formal way: • Standard Deviation: is a measure of the intensity levels dispersion from its mean, in formal way: • Skewness: is a measure of the asymmetry of the probability distribution of gray level intensity about its mean, in formal way: • Cumulative histogram (CH): is the mapping that counts the cumulative number of pixel intensity values in all of the bins up to the current bin, this matrix consists of a normalized histogram of gray image intensity. We can calculate the cumulative sum by the following equation where Q represents the total number of intensity levels in the image: • Entropy: is a measure of the information carried by the probability density P(j), in formal way:

D. K-means Classifier
For in data mining and statistics K-means clustering is used for clustering analysis. The purpose of this algorithm is to analyse data into groups of clustering observations represented by variable K. The total number of specified clusters is represented by symbol K. Based on the provided features the data of each pixel is assigned to one of K groups. The objective of this algorithm is to minimize the value between the centroid of the cluster and the given result by repeatedly attaching the result to any cluster and ends when the measured distance is the smallest value.

IV. RESULTS
The proposed framework has been texted on the collected data from the 20 subjects as described in Section II. Features were extracted from T2-W MR images using five features: CH, MED, STD, SKW and ENT. We obtain the evaluation of the performance by comparing the results of each classifier with the ground truth classification of the experienced neurologist. As demonstrated by our experiments, CH feature showed significant correlation with the detection of MS. It is worth mentioning that MS plaques have the highest gray level than normal WM as shown in Fig. 4. For knowing more data about these features we obtained the evaluation of the performance, we have four basic parameters for each feature, false negatives (FN), false positives (FP), true negative(TN) and true positives (TP). The measurement parameters considered as the following: • True Negative (TN) is the number of normal pixels that have been correctly classified.
• True Positive (TP) is the number of MS pixels that have been correctly classified.
• False Negative (FN) is the number of normal pixels that have been misclassified.
• False Positive (FP) is the number of MS pixels that have been misclassified.
We can perform these parameters in the output of CH and other classifiers as in Fig. 5 and 6. Where red, yellow, blue and green colors represents the FN, TP, TN and FP, respectively. Also in Fig. 7 shows the boxplot ranges for each classifier of MS and normal tissue. 265 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 6, 2020 We can use these parameters to evaluate: 1) Sensitivity (SE) which refers to the number of lesions that correctly classified, it is also called true positive rate (TPR). The ratio of TP plaques to the total number of MS plaques introduce the sensitivity of the test, in a more formal way: 2) Specificity (SP) which refer to the ratio of TN plaques to the total number of normal plaques, it is also called true negative rate (TNR). In a more formal way:   After several tests the performance of classification improved from 82-86% to 95-100% when Weiner and Adaptive filters have been added in preprocessing stage. The results of classification in two-dimension images using statistical features and CH are summarized in Table I, which show that the COMP (combination of all features) scored the highest accuracy and minimum FPR with 98% and 1% respectively, Also, CH gives the minimum FNR 1.78% with slight difference than the fused features. Thus combined features improve the performance of classification and overcome the high false positive rate in CH and increase the accuracy from 97 to 98%. Also in Fig. 7, it shows the difference between normal and abnormal tissues. When comparing these results with previous work, we found that the proposed method has the minimum FPR and FNR than other work. Various researches have been accomplished to obtain the relation between the different gray levels and texture features [5]. In [22] texture analysis based gray level run length matrix (RLM) was performed on 110 Patients with classification accuracy of Multi Sclerosis 96.9%. Also in [23] they have been used MR Images of 30 MS patient where statistical texture feature analysis, autoregressive model, and wavelet-derived texture analysis were accomplished. The accuracy of classifying MS lesions and Normal Appearing WM was 96%-100%.
In [4] their work is divided into three models approach was based on logistic regression (LR) consists of ten texture parameters (Long Run Emphasis, Inverse Difference Moment, Difference Variance, Entropy, Run-Length Nonuniformity, Run percentage, Homogeneity, Low Gray-Level Run Emphasis, Long Run Low Gray-Level Emphasis, Short Run Low Gray-Level Emphasis), the enhanced lesions were classified correctly in twenty one patients with SE = 86% and SP = 84%. Also in [24] they used intensity subtraction and deformation field feature with a TPR of 74.30% and a FPR rate of 11.86% by obtaining a mean Dice similarity coefficient of 0.7. Finally in [25] they used Data Augmentation and AlexNet Transfer Learning model and their results are closed to our method with specificity 98.22% and sensitivity 98%.

V. CONCLUSION
In this paper, the proposed pipeline for the classification of MS lesions using a novel feature performed by combining statistical features with cumulative histogram (CH) as post processing for KNN segmentation. The result of CH feature classification showed that the MS areas were classified well from the other tissues although it has characteristics mostly similar to WM tissues and also by comparing these results with previous work we found that our results has the minimum FPR and highest accuracy. Also, the using of using adaptive filter and Weiner filter in preprocessing stage increased the accuracy of KNN segmentation. The results documented the potential of our framework to classify For qualifying the performance of this work and to improve the sensitivity with decreasing FPR, our future work will be dedicated to (i) conduct a supplement study on a larger number of data, (ii) utilize other segmentation algorithms that could improve the classification features performance, and (iii) extend our method to 3D imaging.