Automated Feature Extraction for Predicting Multiple Sclerosis Patient Disability using Brain MRI

—Predicting Multiple Sclerosis (MS) patient's disability level is an important issue as this could help in better diagnoses and monitoring the progression of the disease. Expanded Disability Status Scale (EDSS) is a common protocol used to manually score the disability level. However, it is time-consuming requires expert knowledge and exposure to inter-and intra-subject variation. Many previous studies focused on predicting patients' disability from multiple MRI scans and manual or semi-automated features extraction. Furthermore, all of them are required patient follow up. This study aims to predict MS patients' disability using fully automated feature extraction, single MRI scan, single MRI protocols and without patient follow-up. Data from 65 MS patients were used in this study. They were collected from multiple centers in Iraq and Saudi Arabia. Automated segmentation, automated brain and brain periventricular are segmentation have been used to extract large scan features. A linear regression algorithm has been used to predict different types of MS patient disability. Initially, weak performance was found until MS patients were divided into four groups according to the MRI-Tesla model and the condition of the patient with a lesion in the spinal cord or not. The best performance was with an average RMSE of 0.6 to predict the EDSS with a step of 2. These results demonstrate the possibility of predicting with fully automated feature extraction, single MRI scan, single MRI protocols and without patient follow-up.


I. INTRODUCTION
Multiple Sclerosis (MS) is a chronic, progressive autoimmune condition that affects the central nervous system (brain and spinal cord). MS occurs when the immune system attacks the myelin that protects the nerve fibres in the brain and spinal cord [1]. The exact cause of MS is still unknown. However, there are several risk factors that have been suggested as possible causes of MS such as genes, lack of sunlight, lack of vitamin D, smoking, race, climate, teenage obesity, viral infections and being female [4]. MS is considered a rare disease in Asia [2][3][4] with a prevalence estimated between 0 and 35 per 100,000 [2,3,5,6,8], resulting in a lack of a high number sample size. Brain and spinal cord MRI are one of the most significant paraclinical tests that aid the diagnosis of MS and can help to substitute for clinical findings. MRI has a key feature for investigation, diagnosis, treatment decisions, monitoring treatment response, and monitoring disease progression of MS. The most significant finding within the MRI related to MS is location, type, size, and the number of MS lesions [8]. Almost all MS lesions can be seen in Fluidattenuated inversion recovery (FLAIR) MRI. FLAIR MRI is one of the most commonly MRI protocols that has been used for MS diagnoses and monitoring the progression of the disease. MS-lesions in FLAIR MRI are typically hyperintense.
Expanded Disability Status Scale (EDSS) is a gold standard to score MS clinical patient disability levels [9]. EDSS is a clinician-administered assessment scale used to evaluate the functional systems of the central nervous system. EDSS scores range from 0, which indicate no disability, to 10, which indicate death due to MS, with an increment interval of 0.5. Fig. 1 shows the EDSS scores range with its corresponding disability level and with the progression of the disease. To assist EDSS, eight neurological Functional Systems (FS) should be scored by an expert. The scoring range for these eight neurological FS examinations is between 0-4 to 0-15 [9]. The lowest score means normal FS, while the highest score means complete loss of function in a particular neurological FS. Scoring the MS patient's disability level using EDSS is time-consuming requires expert knowledge and inter-and intrasubject variation. An automated method to predict MS-patient disability level using a fully automated feature extraction is challenging due to the high variety of MRI inhomogeneous of different image sizes, brain size, image density range, MRI Tesla, and MS type. Some of the MS patients have lesions in the brain, the spinal cord, or both. Furthermore, some MS-lesions are Each side of the brain contains four lobes, and each brain lobe is responsible for controlling specific human activities and tasks. The frontal lobe is significant for cognitive functions and voluntary movement or activities control. The parietal lobe processes temperature, taste, touch, and movement information. While the occipital lobe is primarily responsible for vision. The temporal lobe processes memories, integrating them with sensations of taste, sound, sight, and touch [1]. Brain abnormalities such as MS-lesion in each of the brain lobes may directly affect the FS that is related to its brain lobes. Thus, classifying brain abnormalities based on brain lobes can help to predict which human activities or tasks can be affected by the abnormalities. Furthermore, it clinically proved that the MS lesions near the periventricular brain area significantly correlate to the patient's disability [22,21,23]. Hence, identifying the MS lesion based on the brain periventricular area is significant for MS disability prediction.
This study aims to predict clinical MS patients' disability levels using a fully automated feature extraction, single MRI scan, single MRI protocols and without patient follow-up.
Furthermore, this study seeks to identify the most correlated MRI features to the MS patients' disability levels.
The structure of this paper is as follows. Section Ⅱ will present a summary for the recent related work. Section Ⅲ will explain the dataset, pre-processing, feature extraction and method used in this study. Section Ⅳ will present the results of the method we used. Sections Ⅴ will present the discussion, conclusion and main limitations of this study.

II. RELATED WORK
Multiple Sclerosis disability prediction has been active research in the last few years. MS is a clinically heterogeneous disease. Furthermore, traditionally MRI and patient disabilities have a weak correlation. Thus, most of the previous studies used supporting non-raw MRI data to support MS disabilities prediction. The supporting data include general patient information such as age and gender, clinical information such as MS types and treatment plans, radiological information such as lesion type, lesion location and manual lesion segmentation. Table I, summarises the recent previous studies on MS disability prediction. As a consequence of using supporting non-raw MRI data, all previous studies can consider using manual or semi-automatic feature extraction [10,11,13,14,16,17]. The main limitations of previous studies can be summarized as follow: First, using radiological or clinical data which require human interaction and expert knowledge. Second, using a huge amount of input data. Third, all are cohort studies and require patient follow-up. Fourth, neglecting spinal cord lesions. Fifth use more than MRI protocols. Sixth, using manual or semi-automated feature extraction which required human interaction [12,15,18].
Compared to the related work, this study is using a single brain MRI protocol and without patient follow up with full automated feature extraction.

A. Patients
2D FLAIR MRI for 65 patients from two datasets was used in this study. All patients had confirmed diagnosis of MS with an EDSS scored by an expert. The first dataset was collected at MS-Clinic, Baghdad Teaching Hospital, Medical City Complex, Baghdad, Iraq. It consists of 48 patients, 36 females and 12 males, with an average age of 33 years ranging from 15 to 55 years. The MRI scan was collected between 2016 and 2021. 1.5 Tesla came from more than 20 centers, average EDSS score of 2 ranged from 0 to 5. The second dataset consists of 17 patients, 11 females and 6 males, with an average age of 33 years ranging from 22 to 46 years. The MRI scan was collected between 2017 and 2018. 3 Tesla came from two centers, average EDSS score of 1 ranged between 0 and 6. It was collected at King Fahad General Hospital, Medina, Saudi Arabia.
Typically, MS-lesion considers as a gold standard of brain abnormalities. The MS-lesion in FLAIR MRI is defined as an area of focal hyperintensity. Moreover, the MS-lesions are round to ovoid in shape and range in size from a few millimeters to more than one or two centimeters in diameter [23]. Lesion type, location, size, and lesion number are the most important characteristics that describe brain abnormality for MS lesions. From normal visual observation, it can be seen that the lesion has a weak correlation with the EDSS score. Fig. 2 shows an example of FLAIR MRI for patients with different EDSS scores. It is clear that the size, shape, and number of focal hyperintensity areas are weakly correlated to the EDSS. The yellow circles circulate the focal hyperintensity areas, mostly considered as MS lesions.

B. Methodology
The proposed methodology can be divided into six stages: input data, pre-processing, feature extraction, feature selection, disability prediction algorithm and performance evaluation. The overall proposed methodology is summarized in Fig. 3.
 Input data: Singe brain MRI and single MRI protocols (FLAIR) have been used in this study. Due to the different behaviours of MS disease between patients with a lesion at spinal cord or not. A small portion of radiological data has been used to discrimination between patients with a lesion at the spinal cord or not.
No patient follow-up is required.
 Pre-processing: To transform the raw data into a useful and efficient format, five stages of the pre-processing processes have been used. They are explained in detail in the next paragraph.    The brain segmentation process is used to segment the brain area by removing the skull from an MRI image, keeping only the area occupied by the brain, the dark space between the skull and brain which is occupied by the CSF used to segment the brain area. Brain Extraction Tool (BET) [20] has been used for this purpose. Segment brain abnormalities areas associated with MS disabilities are significant for disability prediction. Multiple sclerosis is a clinically heterogeneous disease. MS brain abnormalities had high variations in size, shape, number, and location. In addition to the high variation of MRI scans in size, quality, Tesla, and density among MS patients. Thus, brain abnormalities segmentation is challenging. Our proposed Dynamic Image Thresholding (DIT) method based on the mean and standard deviation of brain volume has been proposed to segment brain abnormalities. Based on the characteristic of FLAIR MRI, the brain abnormalities in FLAIR MRI are typically hypertension. To identify which level of brain hypertension has the highest correlation to the MS disability. Different values of image thresholding have been performed using (1) and (2) to investigate which level has the highest correlation to the patient's disabilities. The image thresholding was performed with a different thresholding level increment by a step value of 0.05 of the mean and standard deviation above the mean. The step value of 0.05 was chosen to be small enough to investigate the effect of every small change in brain hypertension level. Typically, the standard deviation of brain volume has a much smaller value than the mean value. Thus, the increment value of (2) is much smaller than (1). Fig. 4 and Fig. 5 show examples of BIT segmentation using (1) and (2), respectively, at different values of X. (1) Where μ= mean density value of whole-brain volume. X=1,2, 3, …... until the image thresholding value segment nothing.
σ= patient-based density standard deviation for whole-brain volume. The segmented areas in Fig. 4 at X=10 represent brain abnormalities areas that are very close to the seen brain abnormalities (seen lesion). In other words, the segmented area in Fig. 4 at X=10 segments the brain areas, which have a very close level of image hyperextension to the seen lesion. Moreover, it is clear that the segmented area in Fig. 4 at X<10 segments brain abnormalities at a level of image hyperextension less than the level of the seen lesion, while at X>10 is the opposite, the levels of image hyperextension, which represent brain abnormalities at X<10 and X>10 are hard or impossible to detect by human eyes.
Automated segmentation of brain lobes and periventricular brain area is challenging due to the following: First, high variation of the human brain in shape, size and abnormalities. Second, high variation of brain MRI in quality, size and number of slices. Third, segmentation of brain lobes and periventricular brain area is usually performed using 3D MRI. Thus, using a 2D MRI with a small number of MRI slices is not an easy task. However, an automated method to approximately segment brain lobes and periventricular brain area for 2D MRI has been proposed. A flowchart for the overall process is shown in Fig. 6. The brain lobes segmentation was performed based on 3D brain model with lobes labelled. The axial, sagittal and coronal brain plan of 3D model is shown in Fig. 7(a), (c), and (e), respectively. The brain lobes segmentation is based on three main steps: First, resize the 3D brain model as same as the 2D brain volume. Second, segment both brain volume and 3D brain model into sixteen identical sections. Third, label each section in the brain volume to it is corresponding 3D brain model lobes. An example of the brain lobes segmentation is shown in Fig. 7(b), (d) and (f) for the axial, sagittal and coronal brain plan, respectively. Furthermore, the periventricular brain area is segmented by masking the central 75% of brain volume that can approximately cover the whole brain periventricular area. The periventricular lesions are located adjacent to the brain ventricles system, the periventricular lesion and ventricles system shown in Fig. 8. The output of our proposed brain periventricular is segmentation also shown in Fig. 8.  Feature extraction: Due to the high level of MRI inhomogeneous, a large-scale ratio-based feature extraction has been used. The features were extracted from brain abnormalized areas which were segmented automatically using our proposed DIT method.
Based on McDonald diagnostic criteria [9], FLAIR MRI features and disease characteristics for certain types of features have been extracted based the types of features are lesion locations, shape, size, number and density [7]. All previously extracted features were classified based on the location of brain lobes and brain periventricular areas.
Then, 3D based ration features have been generated based on the above-mentioned extracted features using (3), (4) and (5). The features have been generated for all possibilities of F, L and V. Total of 8200+ features have been extracted for every patient. All features are extracted automatically and without human interaction against the related work using manual or semi-automated feature extraction [10,11,13,14,16,17].
Where = could be one of the following: size, number, mean, maximum, minimum and standard deviation.
= could be one of the brain lobes: frontal, parietal, temporal and occipital. =could be whole brain volume or periventricular brain area.
 Feature selection: To reduce the dimension of the extracted features a filter-based features selection based on correlation analysis has been used to select the highest correlated features using Pearson correlation. Pearson correlation has been used based on try and error.
 Disability prediction: A linear regression algorithm has been used to predict patient disabilities. All prediction algorithms have been performed using the MATLAB 2019a software environment. The linear regression parameters settings are listed in Table II. The linear regression algorithm has been used for the following reasons: First, the linear regression algorithm shows a good performance in predicting both exact EDSS and different ranges of EDSS. Second, able to predict the EDSS value even if it does not exist in the training data. Third, more suitable to work with an unbalanced dataset. Due to the rareness of the disease, most of the MS datasets are leaking of class balance.
 Performance evaluation: The performance of different types of disabilities predictions has been tested for disabilities prediction, including a different range of EDSS and exact EDSS. The EDSS has been used as a golden standard to score patient disabilities. 5-fold cross-validation has been used. The evaluation metrics that have been used are RMSE, R-Squared, MSE and MAE.

IV. RESULTS
More than 8200 features were automatically extracted from the brain abnormalities area segmented by DIT to predict MS patient disabilities. Normally, the EDSS step value is 0.5. However, to investigate the ability of disability prediction, the performance was tested for five levels of EDSS steps starting from 0.5, representing normal EDSS steps to an EDSS step of 2.5. A significant correlation between the extracted feature and the EDSS was found for MS patients after splitting them into four groups based on MRI Tesla and MS lesion location in the spinal cord. Furthermore, no significant correlation was found before patient grouping.
Tables III and IV present the EDSS prediction performance for exact EDSS and different ranges of EDSS. The best performance was with an average RMSE of 0.6 and for the EDSS step of 2. In comparison with the performance of previous studies which predicted exact EDSS and range of EDSS using manual or semi-automated feature extraction and required patient follow-up, the result shows a promising result to predict MS disabilities using full automated features extracted and without patient follow-up.  MS-lesion location and MRI Tesla play an important role to predict patient disability levels. MS-lesion can be found on the brain or spinal cord or both of them. Lesion locations have a high impact on the MS patient's disability level. In this study, a single brain MRI was used to predict EDSS value for those with a lesion in the brain and spinal cord. In addition, a small portion of radiological data extracted from MRI reports has been used to identify the location of the lesion in the spinal cord or not. It is clear that the image quality of MRI affects the performance since MRI Tesla of 3 outperformed the results of MRI Tesla of 1.5. This is due to the differences between the image quality of MRI Tesla of 1.5 and 3. Therefore, increasing the magnet strength will improve the qualities of extracted features to represent brain abnormalities resulting in better prediction performance. From the result, it is clearly shown that by grouping MS patients according to MRI Tesla and lesion location in brain MRI, the performance was improved. It is due to two reasons. First, different MRI Tesla provides different image characteristics in terms of clarity, details, and noise reduction and provides different amounts of signal received from the human body during an MRI scan [24]. Second, different brain and spinal cord lesion locations led to different behaviors of different disease symptoms and progression [23].
The performance of EDSS prediction algorithms was tested with different EDSS steps from 0.5 to 2.5 to investigate which EDSS step the algorithms provide higher performance. EDSS step of 0.5 represents the traditional scoring of disability level using the clinical physical examination. From Tables III and IV, the overall best performance has been obtained with EDSS step of 2.
The proposed EDSS prediction based on DIT shows a promising result to predict the level of EDSS on the tested datasets. Consequently, the DIT have a good representation of brain abnormalities.
The study's main limitations are the lack of high sample size due to the rareness of the disease. It is highly recommended to group MS patients based on MRI Tesla and the location of lesions in the brain only or brain and spinal cord. The proposed method presents promising results for future studies to predict patient disability levels with fully automated feature extraction using single MRI scan without non-MRI data, which may contribute to shorter diagnosis time. Furthermore, this can help better understand and monitor the progression of MS disease.