Identification of Toddlers ’ Nutritional Status using Data Mining Approach

One of the problems in community health center or health clinic is documenting the toddlers’ data. The numbers of malnutrition cases in developing country are quite high. If the problem of malnutrition is not resolved, it can disrupt the country’s economic development. This study identifies malnutrition status of toddlers based on the context data from community health center (PUSKESMAS) in Jogjakarta, Indonesia. Currently, the patients’ data cannot directly map into appropriate groups of toddlers’ malnutrition status. Therefore, data mining concept with k-means clustering is used to map the data into several malnutrition status categories. The aim of this study is building software that can be used to assist the Indonesian government in making decisions to take preventive action against malnutrition. Keywords—Data mining; k-means clustering; malnutrition status of toddler


INTRODUCTION
Data mining is a process of extracting large amounts of data to know the data pattern.Some topics in data mining are association rule mining, data clustering and data classification.Association rule mining is data mining techniques for finding associative rules between combinations of items.Several studies apply the association rule mining is to identify the risk factors of early childhood caries [1], to determine the pattern feedback of data alumni tracer study at the university [2] and to visualisation of financial Arabic text [3].Some studies propose clustering method to solve the problems in their research for example a basic health screening system using Bayesian methods [4], detection of heart disease using decision tree methods [5], and to clasify of Alzheimer Disease using K-Nearest Neighbors (KNN) [6].Several studies also implement the clustering method to perform automatic color segmentation [7], to perform clustering and analysis of earth-quake epicenter [8], and to decrease the load of computation in high dimensional data [9].
Health and nutritional status of children is one of the measure that reflects the public nutrition situation.Malnutrition is not only a burden to the family, but also a burden for the country.Therefore, Indonesian government through the community health center (PUSKESMAS) has conducted data collection of toddlers' nutritional status by using Excel based application.However, the results cannot show the data grouping of nutritional status automatically.The data that available in PUSKESMAS still not able to determine the nutritional status of toddler, according to the standards set by the Indonesian government.When there is a demand for data related to the community's nutritional status, then the mapping process is done manually.This process becomes not optimal as it will require a long process and can occur duplication of data if thousands of existing data are processed manually.
Previous researches have studied malnutrition in elderly, mothers and toddlers [10]- [12] and Child Care Health Consultation [13].Malnutrition is the cause and consequence of many geriatric diseases that cause a very significant proportion of state expenditure on health [14].In [15], author analyzes malnutrition using logistic regression methods and growth charts to reduce the number of children with malnutrition status.This study aims to optimize the data transactions of under five years patients who have malnutrition.The malnutrition patients are grouped according to the nutritional value of children under five years using data mining method with k-means clustering algorithm.Data mining approach is used in this research because data mining are widely used in predicting the various procedures and validity of data.In addition, data mining can improve decision making by finding patterns and trends in complex data [16].
K-means clustering algorithm is also widely implemented in medical science field such as applying k-means clustering to analyze identification of individual characteristics using brainwave signal [17], to identify new candidate drug compounds that have relation with lung cancer drugs [18], to make recommendation of antiarrhythmic drugs [19], and extraction cancer signatures [20].The other studies are clustering medical data to find direction and effectiveness of the research work [21], enhance cancer subtype prediction [22], color-converted segmentation algorithm for magnetic resonance imaging (MRI) brain images [23] and EEG analysis to detect drowsy driving [24].
Based on the literature review, it is important to continue the research collaboration between data mining and medical science field.The data used in this study refer to nutrition report data from PUSKESMAS Umbulharjo Yogyakarta in 2016.Specification of toddlers' data used in this research is 6 months to 72 months old infants.Parameters that used for the grouping of nutritional status of toddlers namely; height, weight and age.every region easily and quickly.This study aims to determine and develop a software that can be used by PUSKESMAS to identify the nutritional status of toddlers using data mining approach to be analyzed in the decision-making process.

II. METHODOLOGY
This research studies the data mapping of malnutrition patients of children under five using data mining approach.The grouping technique uses k-means clustering.The k-means clustering algorithm is the simplest and most common algorithm used to group objects by attribute/feature into k number of clusters, where k is a positive integer and defined by the user.Grouping is done by minimizing the sum of squares distance between the data and the appropriate centroid cluster.The procedure of k-means clustering is shown in Fig. 1 [25].
As shown in Fig. 1, the procedure of k-means clustering can be explained as follows:   Step 4. Repeat Step 3 until the target value is reached, i.e. until the training sample matches and there is no new task.If the amount of data is less than the number of clusters, then assign each data as the centroid of the cluster.Each centroid will have a number of clusters.If the amount of data is greater than the number of clusters, for each data, calculate the Distance to all centroids and get the minimum distance.This data is said to belong to a cluster that has a minimum distance value of this data.If you are not sure about the centroid location, you need to be centroid based on your current location.Then set all data to this new centroid.This process is repeated until no data is moved to another cluster again.The k-means algorithm works by using (1). Information:  (X1, X2 ... Xn): the observation results represent a cluster element with a real d dimensional vector.
 μi: the mean value of the point at Si.

III. RESULT AND DISCUSSION
Why use data mining concept?Because the concept of data mining can analyze and classify the database so that every organization can make decisions based on this classification and can improve their plan in the future.There are many data mining techniques available where we can detect hidden patterns in the database [26].
Referring to the data mining stage in Fig. 1, for the case of nutrition status identification with k-means clustering algorithm, the procedure begins by obtaining patient data from patient's medical record database.Table I shows the patient data with parameters of body height, weight and age of children under five.After the data are loaded, the initial centroids are determined according to 5 groups of toddler's nutritional status, that is Bad with value 0.96, Medium with value 0.73, Good with value 0,73, Over with value 0,355 and Obesity with value 0,04.The data is normalized using the normalization equation.

Normalized value = (initial value -minimum value) / (max value -minimum value) (2)
As shown in Table I, the High Body data have minimum value 85 and maximum value 104, the Weight data have minimum value 10.3 and maximum value 26.5, while the Age data have minimum value 24 and maximum value 48.The data normalization is shown in Table II.
If the obtained data are not consistent, they will change the data centroid through the iteration process.The iteration process will stop if the new ratio value is less than the ratio value in the previous iteration.If the condition has not been achieved, the iteration process will be repeated.The iteration result from the normalization of toddlers' data are shown in Table III.The distance data on the first iteration are presented in Table IV.The center distance data d are presented in Table V and the new cluster center data are presented in Table VI.
By using equation Ratio = BCV / WCV, the ratio result is 163.594.When the ratio is compared with the previous ratio, the value of the new ratio is greater than the value of the previous ratio.Therefore, the iteration process is still continued.Fig. 2 and 3 show the interface of the developed software.www.ijacsa.thesai.org

IV. SYSTEM TEST
The system is tested using cross validation method by comparing the calculation result of k-means manually and with the result of developed system.Based on the patient data that shown in Table I, the system calculation result is presented in

V. CONCLUSION
This research builds a software that can be used to identify the nutritional status of toddlers using data mining technique, with k-means clustering algorithm.The test is conducted by performing cross validation and gives 90% validation that the system can determine nutritional status of toddler by producing 5 clusters, namely, good nutrition, moderate nutrition, malnutrition, more nutrition and obesity.
This research is supported by Ministry of Research, Technology and Higher Education in the research scheme Higher Education Research Cooperation (Penelitian Kerjasama Antar Perguruan Tinggi/PKAPT) grant number No: 118/SP2H/LT/DRPM/IV/2017 and PEKERTI-058/SP3/LPP-UAD/IV/2017 on 17 April 2017.www.ijacsa.thesai.org Step 1. Begin by defining k = number of clusters. Step 2. Enter each initial partition that classifies the data into the cluster k.It can be done by randomly sampling the data, or systematically as follows: Take the first training data sample k as a single element cluster.Each of the remaining training samples (N-k) collect on the cluster with the nearest centroid.When finished, recompute centroid from the newly acquired cluster. Step 3. Perform each sample in a sequence and calculate the distance from the centroid center of each group.If a sample is currently incompatible with the cluster closest to the centroid, replace the sample in this cluster and update the centroid point with the new sample and the sample loss cluster.

TABLE . I
. TODDLER DATA BASED ON PUSKESMAS LOCATION

TABLE .
IV. DISTANCE DATA ON THE FIRST ITERATION Table VII and the manually calculated result is presented in Table VIII.www.ijacsa.thesai.org

TABLE .
VII. RESULTS OF CALCULATIONS WITH THE SYSTEM DEVELOPED