Comparing the Accuracy and Developed Models for Predicting the Confrontation Naming of the Elderly in South Korea using Weighted Random Forest, Random Forest, and Support Vector Regression

Since dementia patients clearly show the retrogression of linguistic ability from the early stage, evaluating cognitive and language abilities is very important when diagnosing dementia. Among them, naming is an essential item (sub-test) that is always included in the dementia-screening test. This study developed confrontation naming prediction models using support vector regression (SVR), random forest, and weighted random forest for the elderly in the community and identified an algorithm showing the best performance by comparing the accuracy of the models. This study used 485 elderly subjects (248 men and 237 women) living in Seoul and Incheon who were 74 years old or older. Prediction models were developed using SVR, random forest, and weighted random forest algorithms. This study revealed that the root mean squared error of weighted random forests was the lowest when comparing the prediction performance using models based on SVR, random forest, and weighted random forest. Future studies are needed to compare the prediction performance of weighted random forest with other machine learning models by calculating various performance indices such as sensitivity, specificity, and harmonic mean using data from various fields to prove the superior prediction performance of weighted random forest. Keywords—Confrontation naming; generative naming; support vector regression; random forest; weighted random forest


I. INTRODUCTION
The elderly population is rapidly increasing worldwide as the life expectancy is extended because the socioeconomic level has been improved and medical science has been advanced. In particular, aging is progressing faster in South Korea than in Europe, the United States, and Australia since South Korea has experienced an increase in the elderly population and a low birth rate at the same time. South Korea entered an aged society in 2017 with the proportion of the elderly population (65 years old or older) more than 14% [1]. It is also forecasted that South Korea will enter a super-aged society in 2026, indicating that the proportion of the elderly population will exceed 20% in 2026 [1]. When the elderly population increases, and the occurrence of senile diseases also increases. Particularly, the incidence of dementia has rapidly increased and it was forecasted that it would reach 633,000 in 2020, a large increase from 220,000 in 2010 [2]. As the number of patients with dementia increases, geriatric medicine has been actively studied the characteristics of early dementia and the early detection of dementia [3,4,5].
Communication abilities, as well as cognitive abilities such as memory, are deteriorated distinctively in the aging process. Kang et al. (2001) [6] reported that 41.4% of the elderly population in South Korea experienced several difficulties in communication during daily life activities. As aging progresses, the elderly gradually have more difficulties in understanding and expressing language [7,8], and also experience difficulties in inference and reminiscence [9]. Particularly, previous studies [10,11], which evaluated the linguistic performance of healthy elderly people, revealed that the elderly had an inferior generative naming ability, indicating the ability to freely recall words, to young adults.
Recently, confrontation naming has drawn attention as an effective differentiation indicator of senile cognitive disorders such as dementia. Since patients with dementia clearly show the retrogression of linguistic ability from the early stage, evaluating cognitive and language abilities is essential when diagnosing dementia [12,13,14]. Among them, naming is an essential item (sub-test) that is always included in the dementia screening test. It has been forecasted that the number of dementia patients will increase as the proportion of the elderly population increases. Therefore, accurately understanding the risk factors of cognitive disorders, diagnosing them early, and providing appropriate rehabilitation accordingly are a crucial issue in the field of geriatrics and gerontology [15].
Over the past decade, supervised learning-based machine learning algorithms such as support vector regression (SVR), weighted random forest, and random forest have been widely used as a way to identify complex risk factors of diseases [16,17,18]. Although ensemble machines have been reported to have better prediction performance in classifying binary data such as the presence or absence of diseases compared to decision trees such as classification and regression trees [19,20,21], most studies used regression models and decision trees to predict the cognitive disorders in old age by using demographic and other factors [22,23], and only a few studies have used ensemble machines. In addition, as far as we are aware, no study has attempted to predicting the communication characteristics of healthy South Korean elderly people in the normal aging process using an ensemble machine. This study www.ijacsa.thesai.org developed confrontation naming prediction models using SVR, random forest, and weighted random forest for the elderly in the community and identified an algorithm showing the best performance by comparing the accuracy of the models.

A. Subjects
This study used 485 elderly subjects (248 men and 237 women) living in Seoul and Incheon who were 74 years old or older. Selection criteria were (1) those without a history of neurological diseases such as stroke or Parkinson's disease, (2) those who received 24 or higher points from the Korean version of Mini-Mental State Exam (K-MMSE) and fell within the normal range, (3) the elderly who did not have visual and hearing impairment for conducting the study, and (4) the elderly who did not have depression according to the results of the Korea-Geriatric Depression Scale Short form (K-GDS-S). Power analysis was conducted using G-Power version 3.1.9.7 (Universität Mannheim, Mannheim, Germany) ( Fig. 1). When predictor variables were nine, alpha=0.05, power (1-B) =0.95, and effect size (f 2 ) was 0.25, the number of samples was 400, indicating that the sample size of this study exceeded the appropriate sample size to conduct statistical tests (Fig. 2).

B. Definition of Measurements and Variables
This study measured the confrontation naming ability by using the Short forms of the Korean-Boston Naming Test (K-BNT-15) because the elderly have limited attention ability and it is difficult to conduct an examination for a long time. [24]. K-BNT-15 is a task to evaluate the confrontation naming ability by looking at the presented picture and saying the name of it. It gives one point per correct answer, and the total score was 15 points. The cut-off score is eight points [24].
Executive function, visuospatial ability, memory, attention concentration, language function, and orientation were measured using the Korean Version of Montreal Cognitive Assessment (K-MoCA) [25]. K-MoCA is a standardized cognitive screening test that can effectively discriminate various dementia patients including mild cognitive impairment (MCI) and vascular dementia. It is composed of multiple aspects of executive functions (4 points; trail-making B task, a phonemic fluency task, and a verbal abstraction task), visuospatial abilities (4 points; a three-dimensional cube copy and a clock-drawing task), memory (5 points; the short-term memory recall task), sustained attention task (6 points; number memorization, target detection using tapping, and subtracting by 7 from 100), language (5 points), and orientation (6 points), and the total score of it is 30 points.
Generative naming was measured using both semantic fluency test and phonetic fluency test among the items of Controlled OralWord Association Test (COWAT), a sub-test of Seoul Neuropsychological Screening Battery (SNSB) [26]. The semantic fluency task requires the activation of lexicalsemantic, and the subject was asked to speak the vocabulary within the "animal" category for one minute. The phonetic fluency test requires the activation of the phonetic-lexical network, and this study conducted only the "k" phoneme. The examiner recorded all responses spoken by the subject for one minute in order on the response sheet, and the correct responses were calculated by counting the total number of words.
Picture description was measured using the task of observing and describing "seashore", an item in the self for oneself section of the Korean version of the Western Aphasia Battery (K-WAB) [27]. This study calculated the correct information unit (CIU ratio, %) according to Eq. 1, indicating the proportion of words providing appropriate and correct information among the descriptions of the "seashore".
CIU ratio (%) = Number of CIUs / Total Number of Words × 100 (1) Working memory was measured using the Digit Span test, a subtest of the Korean Wechsler Adult Intelligence Scale (K-WAIS) [28]. The Digit Span is measured by repeating forward or backward the numbers called by an examiner and it reflects working memory. Digit span-forward starts with 3 numbers, and the number of numbers to be memorized increases by one in the next step. The last seventh step has nine numbers to be memorized. Each step has two trials, and the second trial is conducted only when the subject fails in the first trial. It was scored by recording the number of digits of the step accurately performed by the subject, and the total score is 14 points. Digit span-backward is a task to listen to a series of numbers and repeat the numbers in reverse order, and it was conducted and scored in the same way as the digit span-forward.
Depression was measured using the Short form of Geriatric Depression Scale (SGDS). Sheikh & Yesavage (1986) [29] developed the SGDS based on diagnostic validity studies on the existing Geriatric Depression Scale (GDS). They selected 15 items showing the highest correlation with depression out of the 30 items of the GDS. At the time of development, they reported that the correlation coefficient (r) between the GDS and the SGDS was 0.84, indicating a strong correlation. The cut-off score defining depression was set as 6 points based on the results of previous studies [30,31].  Analyses. www.ijacsa.thesai.org Explanatory variables were age, gender (male, female), educational level (middle school graduate and below and middle school graduate above), mean monthly household income (<1.5 million KRW, ≥1.5 million KRW and < 2.5 million KRW, and ≥2.5 million KRW), living together or not after marriage (living with a spouse, bereavement, and separation from a spouse), smoking (non-smoking and smoking), drinking (non-drinking and drinking), working memory (total score), pictures description (CIU ratio), prevalence of depression, generative naming (total score), executive function (total score), visuospatial ability, memory (total score), attention concentration (total score), language function (total score), and orientation (total score). Table I shows the results of descriptive statistics on the general characteristics of the subjects.

C. Development of a Confrontation Naming Prediction Model for Elderly People in South Korea
SVR is a regression model based on a support vector machine (SVM). SVR is an extension of SVM, so that it can be applied to regression analysis [32]. It is used to predict a random loss value by introducing an e-insensitive loss function [32]. SVR has the advantage of having high explanatory power even for data with nonlinearity or complex patterns. On the other hand, it also has the disadvantage that it requires a long learning time due to high computational complexity and it is difficult to interpret the model because it is impossible to analyze the direct relationship between the independent variable and the dependent variable. Moreover, SVR converts a nonlinear feature space that cannot be separated linearly into a high-dimensional linear regression problem by using a kernel function for nonlinear expansion. Linear, polynomial, and radial basis kernel functions are generally used for this process. The concept of SVR is presented in Fig. 3.
Random forest is one of the ensemble techniques that generate multiple tree models using bootstrap samples and predict the outcome by synthesizing the models. Random forest does not use all p-dimensional explanatory variables, but it splits tree by randomly selecting m-dimensional explanatory variables smaller than that. Random forest has the advantage of being able to use out of bag (OOB) samples because it uses bootstrap samples [34,35]. The importance of the variable can be easily calculated through permutation, and the mean square error (MSE) of the OOB sample is calculated using the regression tree model generated by the bootstrap samples. The concept of random forest is presented in Fig. 4.
Weighted random forest is one of the ensemble techniques that conducts model averaging by applying the same weight to each tree model. Since random forest generated by bootstrapping, there is a possibility that the random forest is composed of models showing good performance and those showing bad performance. If the model averaging is performed with giving more weight to good tree models, it can provide better prediction power than the existing random forest models giving equal weight. Weighted random forest algorithm was developed based on this concept (Fig. 5). Weighted random forest also uses OOB samples as random forest does. Regarding b = 1, …, B, when the MSE e(b) of an OOB sample O(b) was calculated with the tree model Tr(fb), generated with the b th bootstrap sample θ(b), it is assumed that a model with a large e(b) is a bad tree model and a model with a small e(b) is a good tree model. A model averaging technique using a weight given to each tree model (Tr(fb)) by using the calculated e(b) is defined as the weighted random forest. This model used Akaike weights [37] for selecting AIC models.

D. Evaluating the Prediction Performance of Machine
Learning Models Multiple linear regression analysis builds models by applying a regression coefficient estimation method using the least squares method. Random forest limited the number of developed decision tree models to 100. SVR was analyzed using the linear kernel function, the most basic kernel function. It was analyzed by setting c (a parameter determining the generalization of the regression model) as 15.0 and einsensitive loss function (a precision parameter) as 0.001. This study compared the root mean squared error (%) of developed models to compare their prediction performance. Since random forest has randomness, and the random seed was fixed to seed No. 123789 while reiterating the models.    Table II shows the root mean squared error of the confrontation naming prediction models for the elderly in South Korea, developed by using SVR, random forest, and weighted random forest. The results of this study defined a model with the lowest root mean squared error (%) as the model with the best prediction performance. As a result of the test for prediction performance, the random forest algorithm derived with 28.4% (a Root Mean Squared Error) was confirmed as the model with the best performance.  Fig. 6 shows the importance of variables of the final model (random forest) for predicting the confrontation naming of the elderly living in South Korea. The final model confirmed that generative naming-meaning, generative naming-phonemes, memorizing numbers forward immediately, and memorizing numbers backward immediately, and memorizing numbers backward were the main variables with high weight for predicting the confrontation naming of the elderly. Among them, generative naming-meaning was the most important variable in the final model.

IV. DISCUSSION
This study explored factors related to confrontation naming using SVR, random forest, and weighted random forest for the elderly in the community. The results of this study showed that the performance of confrontation naming was significantly associated with executive functions such as generative namingmeaning, generative naming-phonemes, memorizing numbers forward immediately, and memorizing numbers backward immediately. The results of this study agreed with the results of previous studies [38,39] based on the generalized precedence model (GLM), which showed that the performance of confrontation naming was significantly related to the generative naming the language domain of K-MMSE, number memorization (a test that measures working memory and attention), and the attention concentration domain. The results of this study implied that the healthy elderly without neurological diseases or dementia had a close relationship between the performance of confrontation naming and executive functions (e.g., generative naming and memorizing numbers immediately) [38] and executive functions could be major factors in predicting the performance of naming performance. In the future, longitudinal studies are needed to prove the causal relationship between cognitive functions and confrontation naming.
In this study, the CIU ratio of the picture description task was not a significant predictor of confrontation naming. Since the CIU analysis method is mainly used to analyze the language abilities of patients with central and peripheral nervous system damage such as aphasia and dementia, it could have a little impact on confrontation naming in this study, which targeted the healthy elderly in the community. www.ijacsa.thesai.org This study revealed that the root mean squared error of weighted random forests was the lowest when comparing the prediction performance using models based on SVR, random forest, and weighted random forest. Byeon et al. (2019) [21] developed a voucher service demand prediction model using weighted random forest, similar to this study, and showed that weighted random forest showed higher prediction accuracy than other machine learning methods. They suggested developing prediction models by using weighted random forest because weighted random forest giving more weight to good performing tree models showed better accuracy than the conventional random forest, which gives the same weight to all tree models.

V. CONCLUSION
This study, which analyzed the imbalanced data, also confirmed that weighted random forest has better predictive performance than random forest or SVR. It is believed that the weighted random forest will be more effective for developing prediction models for imbalanced y-variables. Future studies are needed to compare the prediction performance of weighted random forest with other machine learning models by calculating various performance indices such as sensitivity, specificity, and harmonic mean using data from various fields in order to prove the superior prediction performance of weighted random forest.