Assessment for the Model Predicting of the Cognitive and Language Ability in the Mild Dementia by the Method of Data-Mining Technique

Assessments of cognitive and verbal functions are widely used as screening tests to detect early dementia. This study developed an early dementia prediction model for Korean elderly based on random forest algorithm and compared its results and precision with those of logistic regression model and decision tree model. Subjects of the study were 418 elderly (135 males and 283 females) over the age of 60 in local communities. Outcome was defined as having dementia and explanatory variables included digit span forward, digit span backward, confrontational naming, Rey Complex Figure Test (RCFT) copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, RCFT recognition false positive, Seoul Verbal Learning Test (SVLT) immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, Korean Color Word Stroop Test (K-CWST) color reading correct, and K-CWST color reading error. The Random Forests algorithm was used to develop prediction model and the result was compared with logistic regression model and decision tree based on chi-square automatic interaction detector (CHAID). As the result of the study, the tests with high level of predictive power in the detection of early dementia were verbal memory, visuospatial memory, naming, visuospatial functions, and executive functions. In addition, the random forests model was more accurate than logistic regression and CHIAD. In order to effectively detect early dementia, development of screening test programs is required which are composed of tests with high predictive power. Keywords—random forests; data mining; mild dementia; risk factors; neuropsychological test


INTRODUCTION
Dementia is rapidly increasing in line with worldwide aging.As of 2013, the global dementia population was over 44 million, and it is expected to increase by more than three times to 135 million by 2050 [1].In particular, the dementia population in Korea is increasing the fastest in the world.That is, it was 610,000 as of 2014, and it is predicted to increase two-fold every 20 years, multiplying by more than four times and reaching 2.71 million by 2050 [2].
The increase of the dementia population is expected to lead to enormous social and economic costs by increasing medical costs and various supporting costs.According to a 2014 survey by the National Health Insurance Service, one of two (48.7%)recipients of long-term senior care insurance was a senior with dementia, and the annual medical cost for dementia per patient was reported to be US$ 2,650, which is more than that for cardiovascular diseases (US$ 1,130) and diabetes (US$ 505) [3].In addition, the number of seniors who received treatment as outpatients increased from 8.2 persons per 100,000 in 1999 to 66.4 in 2010, which is around an eight-fold increase.Total supporting costs for dementia in Korea as of 2010 were estimated to be US$ 7.4 billion, and they are predicted to increase two-fold every 10 years and reach US$ 37.3 by 2050, exceeding 1.5% of GDP [4].Measures must be taken, as the increase in the number of seniors with dementia leads to considerable losses, not only for the patients but also for supporting families, local communities and the country as a whole.
Dementia is known to be a disease associated with the gradual decline of cognitive functions for which full recovery is impossible.Reports suggest that cognitive decline in dementia can be postponed if cognitive functions are systematically managed with medicines, such as cholinesterase inhibitors, in the early stages of dementia [5].Thus, the focus is now on the treatment and early detection of dementia.In particular, prolonging the onset of dementia, even for just two years, with early detection and treatment can lower its prevalence rate by 20% and decrease dementia patients' problem behaviours [6].Thus, the early detection of dementia is crucial from a clinical perspective.
The early detection of dementia is performed based on interviews, standardised neuropsychological tests and neurological tests.Among them, neuropsychological tests composed of assessments of cognitive/verbal functions have been widely used as screening tests to detect early dementia [7].In particular, as the usefulness of verbal ability for detecting dementia has been verified [8], verbal tests have been emphasised as effective screening tests for dementia.Nevertheless, few Korean studies have investigated the characteristics of the cognitive and verbal functions of the elderly using standardised neuropsychological test tools.
Meanwhile, as pattern analysis becomes possible on big data, data-mining analysis, which detects the possible onset of a disease by drawing out reliable conclusions based on data, is gaining attention in the healthcare area.In particular, random forest, which is a machine-learning algorithm using the www.ijacsa.thesai.orgbagging approach, has high accuracy and predictive power, because it predicts the final target variables after creating and combining multiple decision trees with random sampling [9,10].
This study developed an early dementia prediction model for Korean seniors based on the random forest algorithm and compared its results and precision with those of a logistic regression model and decision tree model based on chi-square automatic interaction detection (CHAID).
This study is organised as follows: Section II describes the study participants and analysed variables, and Section III defines random forest and explains the model development procedure.Section IV compares the results of the developed prediction model with those of existing models.Lastly, Section V presents conclusions and suggestions for future studies.

A. Study participants
Data were collected from face-to-face interviews with voluntary participants aged 60-90 living in Seoul and Incheon.Subjects with depression and those taking medicines that hamper cognitive functions were excluded.
The seniors with cognitive impairment were selected as a group suspected of dementia by using the Korean-Mini Mental State Examination (K-MMSE) [11], and dementia was screened with the diagnostic standards for Alzheimer's dementia of the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition [12] and standards for probable Alzheimer's disease of NINCDS-ADRDA [13].In this study, patients with mild dementia were defined as those scoring 0.5-1 point on the Clinical Dementia Rating Scale [14].A total of 418 seniors (135 males, 283 females) were finally analysed.

B. Measurements
Cognitive and verbal ability was measured by Seoul Neuropsychological Screening Battery(SNSB) [15], which is composed of cognitive tests such as attention (digit span forward, digit span backward), verbal memory (Seoul Verbal Learning Test (SVLT) immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive), visuospatial memory (Rey Complex

A. Development of mild dementia prediction model
In order to develop the mild dementia prediction model, this study divided data into training data (70%) and test data (30%).The random forest algorithm was used to develop the prediction model, and the results of the developed prediction model were compared with those of a decision tree based on the CHAID algorithm.The accuracies of the developed models were evaluated with the correct classification rate, and the importance of variables and major factors drawn out were compared respectively.

B. Random forest model
The random forest model is a data-mining technique that combines multiple decision trees in an ensemble classifier [16].Random forest is composed of a training stage, which constructs multiple decision trees, and a test stage, which makes classifications or predictions when there are input vectors [17] (Figure 1).
As random forest is based on decision trees, it has a fast learning speed and the ability to process a large amount of data [18].In addition, random forest has a higher prediction capability than a decision tree, and it can prevent overfitting [19].

A. General characteristics of participants
Among the 418 participants, 32.3% (n=135) were males and 67.7% (n=283) were females.The average age was 67.5 (standard deviation=4.3).Over 18.8% were high school graduates, and 76.5% were living with a spouse.Roughly 15.3% were current smokers, 26.5% were current drinkers and 33.8% exercised regularly (i.e. more than once a week).The prevalence rate of mild dementia was 8.4%.www.ijacsa.thesai.org

B. Results of neuropsychological test for healthy seniors and seniors with mild dementia
The results of the neuropsychological test for healthy seniors and seniors with mild dementia are presented in Table 1.The results of the independent t-test revealed there were significant differences between healthy seniors and seniors with dementia for several factors.These included digit span forward, digit span backward, confrontational naming, RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, and K-CWST colour reading correct (p<0.05).

C. Accuracy comparison among random forest, logistic regression model, and decision tree
The prediction model was developed by using random forest, and its accuracy was compared with those developed using a logistic regression model and a decision tree (Table 2).The results of the analysis on the training data revealed that random forest showed very high accuracy of 72.5% (Figure 4, Figure 5).On the other hand, the accuracy of the decision tree was 71.2%, and the accuracy of the logistic regression model was the lowest with 68.7%.
In the test data, random forest showed the highest accuracy with 72.1%, while the logistic regression model had the lowest accuracy with 67.5%.Hence, random forest had the highest accuracy in both the training data and test data.

D. Comparison of neuropsychological tests for prediction of dementia
The results of the prediction models established based on a logistic regression model, a decision tree and random forest by using 14 neuropsychological tests to predict mild dementia are presented in Table 3.
In the logistic regression model, the prediction of mild dementia involved 12 tests, and its accuracy was 67.7%.These tests included digit span forward, digit span backward, confrontational naming, RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, and K-CWST colour reading correct.
The decision tree based on CHAID involved nine tests for the prediction of mild dementia, and its accuracy was 70.8%.These tests included digit span forward, digit span backward, confrontational naming, RCFT copy score, RCFT immediate recall, RCFT delayed recall, SVLT immediate recall, SVLT delayed recall, and K-CWST colour reading correct Random forest involved 12 tests for the prediction of dementia, and its accuracy was 72.7%.These tests included digit span forward, digit span backward, confrontational naming, RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, and K-CWST colour reading correct.

V. CONCLUSION
The early diagnosis of dementia is important, because it not only reduces the number of cases that progress into dementia but also eases the individual and social burden of support for dementia patients.
As a result of the development of the early dementia prediction model for Korean seniors based on the random forest algorithm in this study, a number of factors were verified to be important indices in detecting mild dementia.These included digit span forward, digit span backward, confrontational naming, RCFT copy score, RCFT immediate recall, RCFT delayed recall, RCFT recognition true positive, SVLT immediate recall, SVLT delayed recall, SVLT recognition true positive, SVLT recognition false positive, and K-CWST colour reading correct.
Numerous studies have reported that verbal memory, visuospatial memory and naming are effective tests for distinguishing seniors with early dementia from healthy seniors [20,21].In particular, naming is known to be the most sensitive test for predicting the progress into dementia [8].In addition, among the various neurological functions that decline with aging, delayed recall and selective attention have been reported to be the most sensitive items for predicting the onset of dementia from mild cognitive impairment [22].Meanwhile, Artero et al. (2003) reported that the progress from mild cognitive impairment to dementia was best predicted when verbal memory and visuospatial ability were assessed together [23].Moreover, in a cohort study on local communities, Dickerson et al. (2007) reported that the decline of not only verbal memory but also executive functions affects the progress into dementia [24].These results imply that integrated assessment including verbal memory, visuospatial memory and performing ability is important in predicting cognitive decline and dementia in old age.
According to the results of the comparison of the accuracies of random forest, the logistic regression model and the decision tree, the accuracy of random forest was the highest.This is presumed to be because random forest is based on a bootstrap aggregating algorithm that creates various decision trees out of 500-odd bootstrap samples.While the decision tree has a risk of overfitting, random forest has higher accuracy than the decision tree, since it is based on a bootstrap aggregating algorithm that predicts target variables through means or probability [19,25,26].Random forest is deemed to be more effective in conducting prediction analysis by using data with many variables to measure, since it draws out multiple training data, forms trees and predicts target variables.
The results of this study imply that verbal memory, visuospatial memory, naming, visuospatial functions and executive functions are cognitive domains that should be included before others in neuropsychological assessment to screen for mild dementia.In addition, in order to effectively detect early dementia, the development of screening test programmes composed of tests with high predictive power is required.

TABLE I .
THE RESULTS OF NEUROPSYCHOLOGICAL TEST FOR HEALTHY ELDERLY AND ELDERLY WITH MILD DEMENTIA, MEAN±SD