Amalgamation of Machine Learning and Slice-by-Slice Registration of MRI for Early Prognosis of Cognitive Decline

Brain atrophy is the degradation of brain cells and tissues to the extent that it is clearly indicative during MiniMental State Exam test and other psychological analysis. It is an alarming state of the human brain that progressively results in Alzheimer disease which is not curable. But timely detection of brain atrophy can help millions of people before they reach the state of Alzheimer. In this study we analyzed the longitudinal structural MRI of older adults in the age group of 42 to 96 of OASIS 3 Open Access Database. The nth slice of one subject does not match with the nth slice of another subject because the head position under the magnetic field is not synchronized. As a radiologist analyzes the MRI image data slice wise so our system also compares the MRI images slice wise, we deduced a method of slice by slice registration by driving mid slice location in each MRI image so that slices from different MRI images can be compared with least error. Machine learning is the technique which helps to exploit the information available in abundance of data and it can detect patterns in data which can give indication and detection of particular events and states. Each slice of MRI analyzed using simple statistical determinants and Gray level CoOccurrence Matrix based statistical texture features from whole brain MRI images. The study explored varied classifiers Support Vector Machine, Random Forest, K-nearest neighbor, Naive Bayes, AdaBoost and Bagging Classifier methods to predict how normal brain atrophy differs from brain atrophy causing cognitive impairment. Different hyper parameters of classifiers tuned to get the best results. The study indicates Support Vector Machine and AdaBoost the most promising classifier to be used for automatic medical image analysis and early detection of brain diseases. The AdaBoost gives accuracy of 96.76% with specificity 95.87% and sensitivity 87.37% and receiving operating curve accuracy 96.3%. The SVM gives accuracy of 96% with 92% specificity and 87% sensitivity and receiving operating curve accuracy 95.05%. Keywords—Brain atrophy; registration; Freesurfer; GLCM; texture features; FDR; decision support system; SVM; AdaBoost; Randomforest Bagging; KNN; Naive Bayes; classification; hyperparameters; GridsearchCV; Sklearn; Python


I. INTRODUCTION
The brain tissues degenerate due to aging a visual difference between normal and atrophied brain shown in Fig. 1. Besides age many other factors viz. social and occupational conditions and family history plays a major role in the degradation process of brain tissues resulting in the cognitive skills of the person nosedive.
This effect is measurable during clinical judgment trials in the form of Clinical Dementia Rating (CDR) score. The CDR value zero means the person is cognitive normal but more than zero means the person is with brain atrophy making him cognitive abnormal.
Another biomarker of brain atrophy is the deterioration of medial temporal lobe structure of the brain which is a volumetric detection using Magnetic Resonance Imaging (MRI) a pathological test. The goal of this study and experimentation is to find mapping of clinical findings and corresponding pathological finds using MRI scans. Medial temporal lobe is that anatomic and physiological part of the brain which is responsible for memory retention and retrieval of information. It is that part of the brain where our short-term memories become long term memories. In a way we can say its non-volatile memory of the brain which becomes volatile because of brain atrophy state. That"s why we only remember only current events and forget as we lose the reference just as the computer"s volatile RAM loses its contents after power is switched off.
Next to find the reasons of dimensional loss, the brain atrophy is characterized by deposits of plaques and neurofibrilliary tangles (NFTs), which cause loss of neurons and synapses. The loss and deposits are a simultaneous process which makes it difficult to distinguish and identify. 115 | P a g e www.ijacsa.thesai.org The extent of brain atrophy is determined by its anatomically distribution i.e. from stage I to stage VI [1], research and findings shows that major area affected as : stage I & II Entorhinal cortex a very small part behind hippocampus, stage III and IV hippocampus and amygdale stage V and VI neocortex., but the severity of disease is determined by NFTs. Hippocampus is a very compact area of the brain in the medial temporal lobe. It consists of cortical areas and main hippocampus. The cerebral cortex is highly folded as it has to be accommodated into a limited volume of brain skull.
Motivation to exploit the machine learning technology and computer based image processing is that radiologists sometimes find it very difficult to localize the degradation patterns because of many above said complicated and compact structures of the brain secondly Individuals show varied patterns. The MRI data itself is complicated 3D images. The 3D images consist of several slices of 2D images. It becomes very cumbersome for the radiologist to scan each slice and get the correlations. In this study we designed a computer aided decision support system of automatic detection using machine learning techniques which is useful for a radiologist for faster easy and accurate decisions.

II. LITERATURE REVIEW
The past few decades have proved to be promising in early experimentation and studies of detection of medical conditions using machine learning as a tool in combination of image processing.
The advancement in medical technology has led to providing data through various modalities of pathology like X Ray, MRI, fMRI, ultrasound scans and other advanced scans and availability of software to handle this data.
Image processing techniques play a significant role in the accuracy of a study. Some earlier studies used (VBM) voxel based morphometry [2][3] [4]. These studies worked on T1 weighted MRI scans on very small groups of subjects, later they used voxel based relaxometry (VBR) on T2 weighted scans of same subjects. In VBM specific tissue templates were used to compare voxel by voxel and they segmented white, grey and cerebrospinal fluid by comparing with reference templates well defined by Montreal Neurologic Institute. The surface reconstruction was done voxel by voxel of size mm each. But such procedures were too complicated and compromise accuracy.
Another voxel based morphometry study [5] used the comparisons of intensities of white matter, deep white matter and periventricular deep white matter voxel by voxel.
Another image processing technique, deformation based image analysis, was used in several studies [6][7] [8]. These studies created a reference space and calculated the deformation required to transfer the individual image into reference space. The other deformation based studies [9] applied Jacobian determinant at each transformation to measure the volume change patterns. The study [7] applied Deformation based morphometry to detect brain changes, but they used the concept of longitudinal DBM where they tried to measure volume changes of same subjects over the period of study.
Tensor based morphometry is another image processing technique used in [10] [11]. They designed 3D metrics of disease base differences in brain structures but again a very complicated and time consuming process. Other Tensor based morphometry [12][13] studies created difference tensors of diseased regions and a common anatomical template, at each pixel a colour coded Jacobin determinant calculated that gives a differential change in volumes at region of interest.
A study applied data mining [14] where millions of voxels are mined to select sufficient no of voxels to predict the hypothesis with high accuracy.
All the above studies performed on very small datasets, with changing lifestyle and growing no of cases in brain atrophy and other brain diseases, related data sets have increased manifolds giving researchers a wider domain to work on and yield better results in early detection of brain diseases using machine learning as a tool for both image processing and identification of diseases. The author in [15] applied Machine learning tools on ADNI (Alzheimer"s Disease Neuroimaging initiative)database. They work on spatial patterns of abnormalities. It was a massive project and carried out on 16 CPU parallel processing as AD-PS scores computation needs overnight processing using parallel processors. It was extension study of earlier study [16].
The author in [17] used machine learning SVM (Support Vector Machine) combined with voxel based morphometry for early detection of brain atrophy using ADNI database. The classifier is used as an iterator to find the weights associated with each voxel. Voxels with particular weight values were selected as features rest are dropped hence voxels as features are redetermined at every training level. This study finds that study accuracy depends on number of subjects in the database.
Texture analysis may be defined as "the feel, appearance or consistency of a surface or a substance". In our study of Biomedical Image analysis image texture provides information about micro and macro structural changes in the tissues and cells. Radiologist with time train themselves to drive a relationship between visual patterns indicating molecular and cellular properties of tissues. Radiologist face many problems in evaluating and inference the biomedical images:  Non Uniformity image acquisition, interpretation and Reporting.
Computer aided mathematical biomedical image texture analysis provides an aid to radiology by interpreting the image in terms of statistical features and signal variation algorithms giving a quantitative definition of image. List of latest texture based studies [18]- [24] on Brain atrophy MRI are listed in Table 1A.
Limitations of above studies are: 1) These were constrained to very small datasets subject numbers below 200 subjects except few. Most of the studies are on ADNI1 and ADNI2, OASIS1, OASIS2, the latest published data set OASIS 3 a potential data to be explored.
2) Most of the studies used cross sectional MRI Database than longitudinal, while brain atrophy is a longitudinal study.
3) Most of the studies are ROI (Region of Interest) based. But such studies need a prior and in depth knowledge of the under study disease, means it becomes necessary that one of co-researcher must be from a medical background. Even when we segment the image to get ROI, the classification accuracy will depend on the accuracy of segmentation. Most studies used SPM or free surfer software to get ROI. Most of the above studies consider only the shrinkage of the hippocampus and cerebral cortex and enlargements of ventricles. But brain atrophy is not localized to some segments of the brain but it affects the brain as a whole, hence the whole brain MRI needs to be analyzed slice by slice as most Radiologists do.

III. DATA PRE-PROCESSING
The baseline of sustainable research and development is the infrastructure, data, software and algorithms. This work used the best image analysis environment which provided computational tools and facilitated the reproducible research and data. The Jupyter notebook is used to provide a flexible and well documented workflow. The Python 3.0 gives the very interesting and useful library modules, which make image processing implementation work very easy, like SimpleITK [25] and Nibable, Sklearn.
The study used OASIS-3 latest release December 2019 MRI dataset. Its retrospective data over the period of 15 years consists of 1098 subjects and more than 2000 sessions. The link to the data is www.oasis-brains.org. The dataset is accompanied with clinical and cognitive assessments. The Table 1B lists the Demographic Details of the Subjects.
In our study we took the patients CDR status at a particular time stamp, and tried to classify for early prognosis of brain atrophy causing cognitive impairment which may lead to Alzheimer. Machine learning approach is data based approach accuracy of study strongly based on data clarity and details because data is the building block of such studies. Besides data acquisition process is not perfect, the MRI scanning results into images which have to pre-processed to improve the accuracy of final results, because the MRI scanning process got affected due to static magnetic field strength, coil variations, tissues penetration difference, eddy currents etc. in MRI machine. The study used Freesurfer [26] open access specialised software for neuroimaging analysis and interpretation of Brain MRI data. The study performed a set of scripts using Freesurfer software to implement preprocessing pipeline procedures as described in Fig. 2.

A. Skull Stripping
It is a process to remove non-skull tissues from the brain MRI Images to improve accuracy of brain image processing to be used for early diagnosis and prognosis of various brain related diseases. Many techniques of brain stripping are used in biomedical image studies.
 Mathematical Morphometric Method: This method uses edge detection and thresholding criteria to remove non skull tissues from brain MRIs. It is highly dependent on initial parameters like threshold values.
 Intensity based Method: This method uses the intensity of the basic feature of image that is pixel to differentiate non brain tissues from brain tissues by using histogram or region growing method.
 Deformable surface based Method: An active contour which works like self growing contour based on energy components of a desirable brain mask is used to separate out brain tissues. It's a very robust method.

B. Inhomogeneity Correction
Inhomogeneity means similar tissues of brain have different pixel intensity during MRI scan of brain, while similar tissues of brain should have approximate same pixel intensities hence this problem is known as inhomogeneity. It is because during MRI scanning process signal intensity is not uniform because different tissues of brain require different magnitude of signal to penetrate so signal is not kept uniform throughout the scan, but this change in signal may result into spikes and inhomogeneity in pixel intensities of same tissues, to correct it signal is convolved with a bias signal using two models additive or multiplicative model. This process is called inhomogeneity correction. If T(x) is the observed image signal with bias field b(x) and noise n(x).
Then two models to represent the observed image signal are: (multiplicative model transferred to logarithmic signal).
Inhomogeniety Corrections methods used in this study are: 1) Modified fuzzy C means: Modified Fuzzy C means segments the brain into three segments background, white matter and gray matter. To improve the quality of segmentation it adds two more parameters that is Spatial coherence of tissue classes t, tissues can be white matter, Gray matter, cerebrospinal fliuid muscle, fat skin or skull or background (as signal penetration depends on type of tissue). And bias field ̂ used to smooth the output image signal. Fuzzy C means jointly segments and estimate the bias field to minimize the inhomoginity and the joint objective function is written as under.
"t" is the number of tissue classes, α is the neighbourhood influence and N x is the number of neighbours, S kx is the voxel X belonging to k th tissue class. The parameters to be estimated for the minimization of O(k) are the class centres {t k } and biasfield estimates {b x }.
2) Non parametric non uniform intensity normalization (N3): Freesurfer scripts uses N3 method of inhomogeniety correction. N3 is a histogram based non uniform intensity correction method. If S = (s 1 ,s 2 ,....s N ) T be instensities of N voxels of a MRI scan and b =(b 1 ,.b 2 ,...b N ) T are the corresponding bias field. The histogram of S will be blurred version of actual true image due to convolution of bias part b. The objective of this algorithm is to minimize this blurriness by de-convolution method using an iterative way to estimate a smooth bias model. The metric to be estimated is known as where (µ1,σ1) and (µ2,σ2) are the mean and standard deviation of two different tissue types. This metric will be optimized if the standard deviation with in one class of tissues is minimum, hence the objective that one type of tissues should approximately should have same intensity values. It is done iteratively in particular value of bin K = 200, we try to estimate the CJV for the values.

C. Co-Registration
Registration is the most crucial stage of pre-processing because it helps to control the changes in data acquisition because of rotational transformational changes in brain position and even the size of brain may be different in different subjects. It helps to quantify the anatomical and morphometric alterations related to an individual (longitudinal studies) and a group of individual (both longitudinal as well as cross sectional studies).A common reference space or template is used to compare the source image and the template by applying optimal geometric transformations. The template can be the brain image of the same subject in case of longitudinal studies or common available templates.

D. Normalization
A technique to have uniform intensity distribution throughout the group of MRI images of a group to improve the accuracy of study using histogram equalization method.

E. Smoothing
It is a technique to remove unwanted noise from the MRI image which may result in incorrect results and affects accuracy of the study.

IV. PROPOSED METHOD
But during study we observed after applying Freesurfer scripts of registration, the slices of inter subjects does not contain the similar information, means the slices of different subjects are not exactly parallel as shown in Fig 3, as our study is slice by slice study the Nth slice of X subject should contain almost same contents as the Nth slice of Y subject. Even the brain size of all subjects not same. We deduced a method to synchronize the inter subject slices. The steps of this method are listed below: Mid_Slice_brainsize_Equalzation_Method:  Find the actual slice number of data acquisition, means first nonempty slice the actual start of MRI scan.
 Find the actual slice number of data acquisition ends, means first empty slice of MRI scan.
 Take the mid of first non-empty slice number and first empty slice of MRI scan., that is actual mid slice of each MRI scan, also calculate the length of scanning in each MRI scan, means Number of Nonempty slices in each MRI scan.
 From Mid Slice and actual size of brain which is actually the Number of Non empty slices we synchronize the Nth slice of X subject to the Nth slice of y subject as shown in Fig. 3.

A. SWMA Slice Wise Multivolume Analyse (SWMA) Design
Multivariate Approach considering Whole Brain Slices instead of Region of Interest (ROI). Earlier studies used ROI because of small sample size. As our sample set is sufficiently large so our study experimented with whole brain slices without compromising loss of information due to segmentation and approximation. Each MRI image is a volumetric representation which is flattened to 256 slices. In computation each slice is a two dimensional matrix of order 256X256. Slice Wise Multivolume Analysis described in Fig. 4. 119 | P a g e www.ijacsa.thesai.org

B. Feature Extraction
This study uses biomedical texture analysis for feature extraction. Texture analysis is a way of extracting image signatures pixel by pixel in terms of intensities, intra and inters pixel relationship and spectral properties. These can be calculated using mathematical statistical tools. Image analysis using this gives consistent, fast and accurate results. The features generated using texture based statistical distribution of pixel intensities give quantitative measures of image which are easily differentiable from each other hence helping image comparison easily. Each element of the matrix is the value of intensity at a particular pixel. We calculated the simple central tendencies statistics of these image slice matrices. These gross values are very much helpful in providing wide characteristics of image slice contents. 1) Mean: it gives a measure of concentration of data around the central distribution of data. But it is affected by extreme observations.
2) Standard Deviation: It is the measurement of how well the mean is able to represent the whole dataset. It gives the dispersion of the data.
3) Skewness: It is the measure of lack of symmetry. It helps us to determine the concentration of observation towards the higher and lower side of the observed data. Steps to create GLCM:  Let x is the pixel under consideration.
 Let M is the set of pixels surrounding pixel x, which lie under the considered region M.
 Define each element mn of the GLCM as the number of times two pixels of intensity m and n occur in specified spatial relationship. Sum all the values with the specified intensity around that pixel x.
 To get symmetric GLCM make a transpose copy of GLCM and then add it to itself.
 Normalize the GLCM, divide each element by the sum of all elements.
If we have a slice of 256X256, GLCM will be too much data, we use some descriptive quantities from GLCM matrices. Each descriptor is calculated in four directions.
X mn is the element of the normalized symmetrical GLCM www.ijacsa.thesai.org N is the number of gray levels Total Number of features from Texture analysis are 28. The most impotent and unique property of these statistical and GLCM features is that these are invariant to geometrical transformations of surfaces like translation horizontal or vertical, rotation, etc. The features should follow the rule of invariance. Features are volumetric signatures of microscopic structures of Brain: The most affected microstructures of the brain are hippocampus, amygdale and temporal horn. Studies show the volume of these structures decline with age but if the rate of change of the volumes over a certain time is more than normal change, it indicates some non-cognitive developments may cause brain diseases in future.

C. Feature Selection
Feature extraction and selection and classification share very thin line boundaries, a good feature extractor and selection technique surely makes the classification very easy and correct, but a good classifier would not need a good feature extractor or selection technique. As the features are the input to the classifiers so either we should have the best features so the classification should be with least error or the classification algorithm should be such that even the features provide least information but the algorithm is smart enough to extract the correct piece of information with least classification error.
Every classifier works on a discriminate function Fci (X), the classifier as described in Fig. 5 will assign a feature vector X to a said class c1 if F ck (X) > F cj (X) for all k<>j.
Objective of this function is that create a boundary or hyper plane in feature space which distinguishes the n No of classes. The hyper plane can be represented with the equation Where but the classifier function"s discriminability gets affected by decision bias degrading Classification accuracy and other scores. The variance σ is also biased. The means the variance of a sample feature is not as expected.
Theoretically when we extract features we hope that each feature help up to some extent to the discrimination function means all are independent but practically it"s not true many times. Table II shows discriminatory performance of basic statistical features in the concerned study and Table III shows the discriminatory performance of GLCM Features. The classification accuracy also depends on dimensionality. After applying a set of feature the accuracy performance may be inadequate we may think to add more no of features to improve the performance at the cost of computational cost but practically as we add the new features generally it increase the performance but up to some extent only after a point as we increase the features the performance decreases. Our study applied Fisher Linear Discriminant It is based on simple criteria if the mean of two sample space features differ than its variance then it will definitely provide better discrimination ability to classify two sets of classes. The vector w in decision function is a scalar dot product with X as in equation vii, results into a vector the direction of this vector is important, not the magnitude. The FLD employs the linear function X such that Should be maximum where m 1 and m 2 are mean of the feature in two different classes and σ1 and σ2 are the standard deviation of features in two classes of the same feature. This is called Feature Discrimination Ratio (FDR). FDR is applied in each classifier, by keeping on adding the features if the classifier shows improved accuracy, if the accuracy or other scores decrease stop adding the features. By applying FDR on our extracted features we find that Mean, standard deviation, skewness, homogeineity in two directions and energy in all four directions are the best FDR values by adding other features the accuracy and specificity sensitivity decrease. But it"s not true in all the classifiers. The AdaBoost, Randomforest and Bagging Classifier based on ensemble techniques are more efficient classifiers and almost give similar accuracy with or without feature selection but SVM and K neighbours accuracy increase a lot after applying FDR.

A. Support Vector Machine
As the objective of a classifier is to find a hyperplane which divides the sample space into desired set of classes with least error, SVM tries to find this hyperplane by processing the input data transferring into higher dimension plane using suitable kernel function so that sample data can be easily classified which cannot otherwise classified in lower dimension plane. The solution vector hyperplane may not be unique. The objective is to find the optimal hyperplane.
If L is the optimal hyperplane and two hyperplanes S and T passing through the nearest vectors in two classes from the optimal hyperplane. Then the distance between the optimal hyperplane L and S or L and T is called margin. The points on the hyperplane S and T are called support vectors, as shown in Fig. 6. These are the vectors which are the most informative for the classifier. The algorithm implements such that the controlling parameters are C and gamma and the kernel. Kernel is the function which converts the input features from lower dimensional plane to higher dimensional plane. C is a regularity parameter which changes the width of margin and gamma decides how much stringent is the classifier to the outliers. The training the data with SVM is that we want the hyperplane margin big enough to generalize the classifier. The C is the costing factor also, if C is large then it gives a large penalty and margin will be small but if C is small less penalty hence margin will be big. But the behavior change also depends on the particular size of sample set, the hyperparameter tuning results vary from model to model. The hyperparameter tuning do have limitations like, hyperparameters values change from dataset to datasets. The best parameters for one dataset may not work perfectly with other datasets. Moreover it is a time consuming process. But Data Processing and classification model evaluating scores really affected by hyperparameter tuning. It gives practical experience of algorithms. The classifier behaviour under various parameters gives an insight of its design. Fig. 7A depicts the hyperparameter tuning C and Gamma to optimize accuracy, Fig. 7B depicts the hyperparameter tuning to optimize specificity and Fig. 7C depicts the hyperparameter tuning to optimize sensitivity.

1) SVM classification with full features:
First the experimentation was carried out with full features, Table IV shows the results of GridsearchCV method, which internally applies 10 fold cross validation under a given set of parameters. The best value of accuracy is 92.95% with specificity 84.22% and sensitivity 79.28%. The results are again checked with 10 fold cross validation with hold out data, the results are comparable with receiving operating curve area showing accuracy as shown in Fig. 8.  Table V are results of Gridsearch CV exploring SVM under varying C and gamma, using a subset of features after applying FDR. The highest value of Accuracy is 96.09% with specificity 92.63% and sensitivity 87.21%.The results are again checked with 10 fold cross validation with hold out data, the results are comparable with receiving operating curve area showing accuracy as shown in Fig. 9.

B. Random Forest
The Random Forest algorithm is a meta-process which internally works on N no of decision trees to keep the information. Unlike decision tree the result is based on a multiple decision trees, here the algorithm based on divide and conquer approach means it divides the samples among N no of decision trees randomly and then enumerates the decision of all these trees to give the final result. Its way of taking advice of N experts rather than single. It"s an ensemble approach hence time consuming but because today the technology is advanced to handle parallel processing so mean time to fit is not that important criteria to evaluate a classifier. One more important thing the study observed, Feature selection process does not much affect accuracy as Random forest itself chooses both sample divides as well as feature vector divides. The results with FDR or without FDR are almost the same. The Random Forest classifier is a very stable classifier which the study found during the GridsearchCV method. The Accuracy range does not change much even after tuning hyper parameters. Table VI are results of GridsearchCV with all features, the best accuracy is 89.98% with specificity 88.23 and sensitivity 56.39%. The results are again cross validated with hold out data and compared with receiving operating accuracy as shown in Fig. 10.

1) Randomforest classification with full features:
2) Randomforest classification with FDR selected features: The random forest hyperparameters tuning after applying FDR, results are listed in Table VII, with maximum accuracy 90.6% with specificity 87.13% and sensitivity 61.55% with criterion entropy max_depth None and No of estimators 100. The results are cross validated on hold out data and results are comparable for receiving operating area accuracy using 10 fold cross_validation algorithm shown in Fig. 11.

C. AdaBoost
Boosting is a process which is designed to deal with the problem of weak learning classifiers. Weak learning results in higher detection errors and low decision accuracy of the classifier. Weak classifiers are the moderate classifiers which give a bit better insight of the problem than random guesses. AdaBoost is a classifier which deals with a set of weak classifiers iteratively. Logic of using same weak classifiers on same data does not lead to a better results, but AdaBoost is designed in such a way that during each iteration the weak classifiers work with subsets of data, not full data as whole, these subsets of data may give different results with weak classifiers, initially all the classifiers are assigned equal weights, but after each iteration the classifiers are judged on the basis of classification error, the classifiers with less error is given higher weight. AdaBoost is a kind of greedy algorithm with the objective of minimizing the classification error by improving the learning model after each iteration. AdaBoost is an adaptive boosting algorithm because it has no error bound and no bounds on the number of weak classifiers.

1) AdaBoost classification with full features:
The AdaBoost algorithm works better with full features. Table VIII shows results of AdaBoost with all parameters GridsearchCV results with maximum average accuracy 96.76% with specificity 95.87% and sensitivity 87.37% using learning rate 1 and No of estimators 150. AdaBoost wins over all the classification method. The results are cross validated on hold out data using ROC curves shown in Fig. 12.
2) AdaBoost Classification with FDR Slected Features: The FDR degrades the accuracy of AdaBoost. Table IX shows AdaBoost with Gridsearch CV results With 10 features the best accuracy is 91.6% with specificity 86.15% and sensitivity 68.59% using no of estimators 150,learning rate 1. The results are cross validated on hold out data using ROC curves shown in Fig. 13.

D. Bagging Classifier
It is also an ensemble technique classifier very similar to random forest classifier, as in such classifiers the subsets of samples are randomly chosen in random forest, in which the previously selected samples are replaced with new samples. This is also used to improve the accuracy and other performances of decision tree classifiers.

1) Bagging classification with full features:
Gridsearch CV results for different parameters are tabulated in Table X. The best accuracy is 86.86% with specificity 87.25% and sensitivity 38.95% which is using maximum samples selected from the bag are 200 and No of estimators 200, which are cross verified using hold out data using receiving operating curve accuracy as shown in Fig. 14.
2) Bagging classification with FDR slected features: Table XI lists the results of GridsearchCV using FDR selected features the accuracy is 86.1% with accuracy sensitivity 38.95 and specificity 85.9%, the results are cross verified on hold out data using Receiving Operating Curve accuracy as shown in Fig. 15.

E. Nearest Neighbours
KNN is a non parametric classifier, it is a lazy algorithm but very simple. Like to predict a vector X, it will look k Vectors which are nearest to X, the distance is generally calculated using Euclidean or Manhattan metrics which measure the distance between two observations X s and X t for j features.

√∑ ( ) Euclidean Distance
∑ | | Manhattan Distance www.ijacsa.thesai.org       First do the prediction for k nearest point, the predict of X point will be 1 if most of k nearest points predict as 1 otherwise -1. The k generally is odd. Table XII maximum accuracy 82.65 % with specificity 60.01% and sensitivity 36.85%, same is verified using hold out data as shown in Fig. 16, with K equals to 5.

1) KNN classification with full features: The Gridsearch results of KNN with Full features listed in
2) KNN Classification with FDR Selected Features: The accuracy is increased noticeably using FDR, the results are listed in Table XIII showing maximum accuracy 91.5% with specificity 81.54% and sensitivity 74.04% with K equal to 5. The results of Table XIII are verified in Fig. 17 using hold out data using ROC curve.

F. Gaussian Naive Bayes
It is a probability based classifier that works on Bayes theorem that states the outcome of an event can be measured from the past probability of events. It's a non parametric algorithm. As there are no major parameters to vary so GridsearchCV testing is not done for Naive Bays.

1) Naive bayes classification with full features: Naive
Bayes results average accuracy 71.23614190687361specificity 85.95%sensitivity 32.78%.The results are cross validated with ROC accuracy on hold out data as shown in Table XIV.
2) Naive Bayes Classification with FDR Selected Features: FDR helped to improve average accuracy 74.86 specificity 86% sensitivity 37%. The results are cross validated with ROC accuracy on hold out data as shown in Table XV.

VI. RESULTS AND MODEL EVALUATION
The Model is evaluated on the basis of Accuracy, Specificity and Sensitivity and accuracy from Receiving Operating Curve. It"s a screening test so more priority is to optimize the Specificity than sensitivity. The formulations of these metrics are: The confusion matrix is defined as

Accuracy
Specificity

Sensitivity
We tried to optimize the accuracy sensitivity and specificity using GridsearchCV method which applied 10 fold Stratified method for a given classifier with a given set of input parameters. The evaluation results using different classifiers with GridsearchCV method are listed in following tables. The experiments are done twice using feature selection with Fisher Discriminate Ratio method.

VII. RESULT COMPARISONS CHARTS
The results of different classification models are compared in Fig. 18, 19  The objective was to design a Decision support system for the Radiologist which help them for fast and correct predictions for the early detection of brain atrophy which can result into Alzheimer in future, we are able to deduce a system where radiologist can input the middle 25 slices from slice_no 110 to 140 of MRI to the system as input and on the basis of data in these slices the system can results the prediction about atrophy of brain. The accuracy of results can be achieved the best with AdaBoost classifier 96.7% and specificity and sensitivity. This study has achieved a better accuracy than the earlier research works because correct registration method and better classifiers that is AdaBoost. It will definitely going to support the radiologist for better decision of brain atrophy. This is a screening test so it"s more important to have more specificity than sensitivity. This is an academic research with a purpose to explore machine learning classifiers and their parametric studies. The study also gives a hands out experiences for Image processing, how biomedical texture analysis helpful to extract image signatures which can be used for classification. It"s a comparative study on the basis of different classifiers and further how classifiers results can be improved using feature selection criteria, but it also give an insight how some of classifiers are strong classifiers where feature selection criteria does not affect much its performance.

IX. FUTURE WORK
The Support system lacks the front end, in the future work we can design an automated system which automatically extract middle slices with proper frontend system where radiologist can feed the DICOM image slices and the system should give a report about the slices. Many other texture features can be explored to improve the performance. Many other feature extraction methods as well as classification techniques can be explored for better results. The study consumed much time in preprocessing of data, a fast and error data preprocessing steps can be explored in future work.

ACKNOWLEDGMENT
As the study is a practical study under the domain knowledge of Dr Ritesh Garg, Sr. Radiologist, who is owning MRI Diagnostic Center. The results had been verified under the supervision of radiologist. Our sincere thanks and gratitude to Dr Ritesh Garg for his unconditional support while analysing the data as without his help at every point of analysis, this study would have not completed.