Chi-Square Automatic Interaction Detection Modeling for Predicting Depression in Multicultural Female Students

This study developed a depression prediction model for female students from multicultural families by using a decision tree model based on Chi-squared automatic interaction detection (CHAID) algorithm. Subjects of the study were 9,024 female students between 12 and 15 years old among the children of surveyed marriage immigrants. Outcome variables were classified as presence of depression. Explanatory variables included sex, residing area, experience of career counseling, experience of social discrimination, experience of Korean language education, experience of using a multicultural family support center, Korean reading, Korean speaking, Korean writing, Korean listening, Korean society adjustment education experience, needs of Korean society adjustment education, needs of Korean language education, and rejoined entry. In the CHAID algorithm analysis, female students from multicultural families who experienced social discrimination within the past one year and had ordinary Korean speaking skill posed the highest risk of depression. It is necessary to pay social level interests to the mental health of adolescents from multicultural families for achieving successful social integration based on the results of this study. Keywords—CHAID; data mining; multicultural family; risk factors; depression


I. INTRODUCTION
The number of children from multicultural families is rapidly increasing in South Korea due to the increase of international marriage.The number of children from multicultural families was 100,000 in 2010, doubled in 2014, and is expected to exceed 300,000 in 2020 [1], [2].Particularly, the low birth-rate has become lower than the population replacement rate, 2.1 children per female, so the proportion of students from multicultural families will increase steadily [1].
Nevertheless, the policies for multicultural families in South Korea mainly focus on employment or welfare and there are not enough studies about the health of multicultural families [3], [4].Moreover, the previous studies on adolescents from multicultural families mainly aimed at their academic performance and school adjustment and only a few studies evaluated the emotional characteristics of them, including depression [5].
Adolescents from multicultural families have a high probability to experience negative emotions due to social prejudice and discrimination [5].A previous epidemiological survey showed that adolescents growing in low socioeconomic status (e.g., household income) had higher possibility to experience depression [6].Considering that the proportion of multicultural families is higher in the rural area than in the urban area and only 9.7% of household had an income higher than 30 million KRW in the rural area, it is anticipated that adolescents from multicultural families are more likely to be exposed to negative emotions [7].Byeon [8] reported that approximately 15% of adolescents (age between 19 and 23) from multicultural families experienced social discrimination.Moreover, Tienda, and Haskins [9] also stated that children from internal marriage families, who were born in South Korea as well as immigrated after birth, had a hard time to adapt to the society due to the social characteristics (e.g., multicultural family) and rapid changes during the adolescent period.As shown, adolescents from multicultural families are very likely to have factors associated with emotional aspects so they are more vulnerable to depression than the other adolescents.However, the previous studies on the emotional aspect of adolescents from multicultural families mainly used methods comparing the characteristics of adolescents from multicultural families and those from ordinary families in order to identify the individual risk factors [10].Additionally, the previous studies on multicultural families focused on welfare so statistics on health and healthcare status are lacking.Emotional problems are induced by multi-dimensional factors such as environment, social support, and stress.Therefore, it is necessary to conduct a multiple risk factor analysis in order to accurately identify the characteristics of depression.Recently, data mining techniques (e.g., artificial neural network and decision tree) are frequently used to establish a prediction model for multiple risk factors [11], [12].
This study developed a depression prediction model for female students from multicultural families by using a decision tree model based on Chi-squared automatic interaction detection (CHAID) algorithm.The study is composed as follows.Section 2 explains the study subjects and CHAID algorithm.Section 3 discusses the results and the power of the developed CHAID based prediction model.Section 4 presents conclusions and future study directions.www.ijacsa.thesai.org

A. Study Participants
This study used the raw data of 2012 Nationwide Multicultural Family Status Survey, which was conducted for multicultural families living in South Korea by Ministry of Health, Welfare, and Family Affairs, Ministry of Justice, and Ministry of Gender Equality.Multicultural Family Status Survey was carried out to understand the living conditions and the welfare needs of multicultural families in order to develop customized policies for multicultural families [2].The items of this nationwide survey were composed of general characteristics, economic level, employment, health and health care, and marriage.Multicultural Family Status Survey was conducted between July 20 and Oct 31, 2012.The subject of the survey was 154,333 people, all married immigrants at the time of survey according to the alien resident status and the basic multicultural family status data of Ministry of Public Administration and Security.The selection criteria of multicultural families are based on Multicultural Family Law, as follows.First, it targeted families composed of an immigrant(s) and a Korean citizen(s).Second, it was defined as a family composed of a foreigner(s) who acquired Korean citizenship through report or naturalization and Korean(s) who acquired nationality by birth, report, and naturalization.This study targeted 9,024 female students between 12 and 15 years old among the children of surveyed marriage immigrants.

B. Measurements
Outcome variables were classified as presence of depression (yes or no).Explanatory variables included residing area (rural or urban), experience of career counseling (yes or no), experience of social discrimination (yes or no), experience of Korean language education (yes or no), experience of using a multicultural family support center (yes or no), Korean reading (good, intermediate, and poor), Korean speaking (good, intermediate, and poor), Korean writing (good, intermediate, and poor), Korean listening (good, intermediate, and poor), Korean society adjustment education experience (yes or no), needs of Korean society adjustment education (necessary, average, and not necessary), needs of Korean language education (necessary, average, and not necessary), and rejoined entry (come to Korea after living in a foreign country or born and grown up in Korea).

A. Exploring Predictors
General characteristics were presented with mean and percentage by using descriptive statistics.The difference between groups due to the depression was analyzed by Chisquare test.

B. Chi-Squared Automatic Interaction Detection
CHAID is an algorithm that performs the multiway split* by using Chi-square or F-test [13].CHAID algorithm uses Pearson's Chi-square when a target variable is categorical and uses likelihood ratio Chi-square statistic as a separation reference when a target variable is continuous [14].Chi-square is calculated from the r × c partition table composed of observations (fij).The function of Pearson's Chi-square statistic is shown as (1). ) The function of likelihood ratio Chi-square statistic is shown as (2).
The Chi-square statistic, very smaller than the degree of freedom, implies that the distributions of the target variables for each category of the predictor variable are the same.Therefore, it can be concluded that the predictor variable does not affect the classification of target variables.The magnitude of the chi-square statistic for the degree of freedom can be expressed as a p-value.When the chi-square statistic is smaller than the degree of freedom, the value of p increases.As a result, using Chi-square statistic as a separation reference means that the child node is formed by the predictor variable with the smallest p value and the optimum separation.
This study treated all explanatory variables including outcome variables as categorical variables in order to minimize the convenience of CHAID algorithm as much as possible [15].In the model of this study, the separation and merge criterion of the decision rule for CHAID algorithm was set as 0.05 and the numbers of parent nodes, child nodes, and branch were limited to 250, 150, and 4, respectively [16].The validity of the model was assessed by using a 10-fold cross-validity test and the degrees of model's risk were compared [17].

A. General Characteristics of Participants
Among the total of 9,024 female students subjects, 2,627 subjects (29.1%) experienced depression during the past year (Table I).The results of chi-square test showed that there were significant (p<0.05)differences in rejoined entry, Korean speaking level, Korean listening level, Korean reading level, Korean writing level, career and consulting education experience, and social discrimination between subjects with depression experience and those without depression experience.Depression experience rate was high for female students who entered South Korea through rejoined entry (32.6%), could speak Korean at an intermediate level (42.6%), could listen Korean at a poor level (38.9%), could read Korea at an intermediate level (37.6%), could write Korean at poor level (40.4%), had experienced in Korean education (42.9%), experienced career counseling, and experienced social discrimination (53.0%).www.ijacsa.thesai.org   2 Gain n(%); gain number, % to 2,627 3 Response (%): The fraction of the depression in subjects 4 Gain index (%):=343.9in total 10 node

B. Results of Prediction Model of Female Students from Multicultural Families based on CART Algorithm
The depression prediction model of female students from multicultural families based on CART algorithm is shown in Fig. 1.The established depression prediction model revealed that the experience of social discrimination, Korean society adjustment education requirement, Korean speaking level, career consulting experience, and Korean education needs were important predictor variables in the order of magnitude.
Table II presents the profit chart of the depression prediction model of female students from multicultural families.Among 11 paths, 4 paths were confirmed to predict depression effectively.The first path was "female students from multicultural families who experienced social discrimination within the past one year and had ordinary Korean speaking skill".The profit index was 343.0%.The second path was "female students from multicultural families who did not experience of social discrimination with the past one year, had career consulting experience, and needed to Korean society adjustment education".The profit index of this path was 246.0%.The third path was "female students from multicultural families who did not experience of social discrimination with the past one year and have career consulting experience and expressed average needs for Korean society adjustment education".The profit index of this path was 212.0%.The fourth path was "female students from multicultural families who experience of social discrimination with the past one year and considered Korean language education was unnecessary even though their Korean speaking level was good or poor".The profit index of this path was 196.5%.
The results of the 10-fold cross validity test showed that the predictive accuracy of the model was 74.2%, the risk index of the cross classification model was 0.258, misclassification orate was 25.8%, and standard error was 0.005, which agreed with risk index of 0.259, misclassification rate of 25.9%, and standard error of 0.005 of the predictive model Fig. 2.This study developed a depression prediction model for children from multicultural families by using CHAID algorithm and found that the experience of social discrimination is the most critical factor affecting depression.Although it is hard to compare the results of this study directly, the previous studies evaluating the relationship between social discrimination and mental health reported that the economic discrimination and the discrimination against a specific group (e.g., the elderly group) were significant predictor variables negatively influencing mental health [18].Therefore, it is necessary to establish a legal system and pay social level interests to overcome the discrimination and prejudice against adolescents from multicultural families based on the results of this study.
Another finding of this study was that "female students from multicultural families who experienced social discrimination within the past one year and had ordinary Korean speaking skill" posed the highest risk of depression.It has been repeatedly reported that children from multicultural families had lower language level than those from ordinary families in the language development [19].It was also reported that the difference in the language development between those from multicultural families and those from ordinary families disappeared as they became older [20].However, it is known that children from multicultural families still experienced difficulties in learning Korean even after adolescence.Since the immaturity in Korean during the adolescence period has a decisive influence not only on academic achievement but also on social adjustment [21], continuous Korean language education for adolescents from multicultural families is necessary to form successful social integration and prevent mental illness.
It is necessary to pay social level interests to the mental health of adolescents from multicultural families for achieving successful social integration based on the results of this study.

Fig. 1 .
Fig. 1.Prediction model for experience of depression symptoms in children in multi-cultural families.

Fig. 2 .
Fig. 2. The results of the validity test.V. CONCLUSION It will be necessary to identify the difficulties experienced by children from multicultural families and provide a systematic program aiding their social adjustment for successful social integration.

TABLE II .
GAINS CHART OF PREDICTOR VARIABLE BY CHAID ALGORITHM