Naive Bayes Classifier Algorithm Approach for Mapping Poor Families Potential

The poverty rate that was recorded high in Indonesia becomes main priority the government to find a solution to poverty rate was below 10%. Initial identification the potential poverty becomes a very important thing to anticipate the amount of the poverty rate. Naive Bayes Classifier (NBC) algorithm was one of data mining algorithms that can be used to perform classifications the family poor with 11 indicators with three classifications. This study using sample data of poor families a total of 219 data. A system that built use Java programming compared to the result of Weka software with accuracy the results of classification of 93%. The results of classification data of poor families mapped by adding latitudelongitude data and a photograph of the house of the condition of poor families. Based on the results of mapping classifications using NBC can help the government in Kabupaten Bantul in examining the potential of poor people. Keywords—Data Mining; Naive Bayes; Poverty Potential; Mapping


INTRODUCTION
Poverty in Indonesia has a number that is still quite high, above 10% [12].It is becoming a top priority for the government to find solutions to reduce the poverty rate is below 10% [2].Central Statistics Bureau (BPS) defines poverty as the inability to meet the minimum standards of basic needs that include food and non-food needs.BPS showed that the poverty rate in Indonesia, in September 2014, was still high at about 27.7 million people, or approximately 10.96% [11].The poverty data graph shown in Figure 1.The number of poor people in Indonesia are mostly locate in Java island with a total of 57.8% of the total number of poor people in Indonesia, and it is located in Yogyakarta province.Poverty measurement in each country or even in every region that does not have the same size [8] [9].The poverty measurement that called as the poverty indicator becomes the most important part in determining poverty status [8] [9].In Bantul, which is one district in Yogyakarta has a fairly high poverty rate, above 14%.
The determination in classifying the poverty status of someone is the tough section that needs hard effort because it must represent the accurate results.Naive Bayes Classifier is one of the data mining algorithms that uses probabilistic approach [1][4][5].This research will discuss how Naive Bayes Classifier algorithm can classify the status of poor families to identify potential poverty based on existing indicators.There were 11 indicators of poor families used in this study, and each of them has certain value [10].The indicators were food, clothing, shelter, income, health, education, wealth (rupiah), property (land), water, electricity and the number of family members.While the classification used is very poor, poor and vulnerable poor [10] [12].

A. Poverty
Poverty is a matter of deprivation or problematic deficiencies.Poverty is a condition where a person or a family is in a state of deprivation [2] [9].From these definitions, poverty can be divided into two parts: absolute and relative, a) Absolute poverty is defined as the inability to achieve a minimum standard of life.Understanding the needs of different minimum standards in each country.
b) Relative poverty, on the other hand, is defined as the inability to achieve the standards of contemporary needs , which is linked to the welfare-rata average or average income community planning at the time.
Based on the data, the factors are distinguished into the data that affect poverty in the countryside and in urban areas, too.The comparison is important because poverty does not only happen in rural area but also in urban area.Based on this geographical approach, then poverty can be differentiated into poverty in rural and urban areas.[9].www.ijarai.thesai.org[9].

B. Data Mining
Data Mining as a process to obtain useful information from a data warehouse [6] [7].The term data mining is often called knowledge discovery.One technique that is made in data mining is to explore existing data to build a model and then use that model in order to identify the pattern of other data that is not stored in the database [6].

C. Naive Bayes Classifer (NBC)
Naive Bayes Classifier estimates the conditional class opportunities which assume that the attributes are conditionally independent and given the class label Y [3] [5] Conditional independent assumptions can be expressed in the following form : each set of attributes consisting of d attributes.There is a special treatment before the features with numeric data types are put into Naive Bayes.The first way is to use discretization and the assumption of a Gaussian distribution.Gaussian distribution was chosen to represent the conditional probability that a continuous feature in a class independency P(Xi|Y).This Gaussian distribution approach was used by the researcher to obtain a probability value of each poverty indicator.

Generics Naive Bayes Classifier Algorithms:
1) Read attributes and class of the data set.
2) Calculate the posterior probability of each attribute to an existing class.
3) Calculate the probability pior of existing classes.4) Calculate the multiplication value of the posterior probability of each class and the value prior to all existing classes.
5) Find the greatest probability value in step four as the final classification.

III. METHODOLOGY
The data has used in this study taken from the poor families in the Kabupaten Bantul.The overall system can be seen in the block diagram that existed at Figure 2. The block diagram system represented in Figure 2 was divided into three parts.The first part was the data input which was consisted of three classes of poverty (poverty status) and the poverty data which would be used for identification.Poverty classes consist of very poor, poor and vulnerable poor.The number of parameters for classification was composed from 11 indicators as presented on Table 1.There were 219 data which were divided into two parts: 80% (175 data) were used for training data and 20% (approximately 44 data) were used for data testing.The second part was the main process of Naive Bayes Classifier that calculated a probability value to be used for classification.The calculated data was the training data set.The training phase results were in the form of probability values which would be used for testing.
Phase testing was done to see the accuracy of the obtained classification.The third section resulted the classification of the poverty class which would be mapped to see the poverty potential in an area by using Google Maps.
The training process (training) on the algorithm of Naive Bayes Classifier (NBC) can be seen in Figure 3. www.ijarai.thesai.org1).Whereas for the testing process can be shown in Figure 5.The data used for testing were as much as 20% which was approximately 44 data from 219 data.
The results of the testing phase was used to calculate the probability of each classification by using a probability value obtained in the training phase to determine the poverty classification results by taking the smallest probability.In the testing phase, it can be seen that the high accuracy of the identification of the poor people status in Kabupaten Bantul.Figure 6 represented the display of testing menu interface.

IV. RESULT AND DISCUSSION
The implementation of Naïve Bayes algorithm to determine the classification of poverty was built using Java.These results were used as the input for mapping poor families.These results would be mapped using Google Maps by adding the data ordinates (latitude and longitude) location of a poor family.The results of testing the classification of the data shown in Figure 7 were presented in the form of recapitulation.
From the results of the data testing, the accuracy of the data was 92.5% which came from the data of 44 poor people from the total amount of 219 poor residents.Results of existing data were also compared with the results from the Weka software.Before the data were being processed, preprocessing the data was done previously.This stage was done to look at the description of the data which needs to be processed using NBC.The description of statistical data www.ijarai.thesai.orgshowed that the data to be processed had an average value 1.804 and a standards deviation value of 2.869.From this value, it was indicated that the deviation of the data is very high.After preprocessing being done, then classification analysis was done using Naive Bayes Classifier (NBC) algorithm.From Weka data testing results in Figure 8, it was shown that the classification results had the accuracy of 93.18%.From 44 data tested using Weka, there were 41 data that could be recognized correctly, while there were 3 data that could not be identified.The results of the classification in Figure 7 were used as the input for mapping the poor families by adding the data latitude and longitude as well as home photo of the poor families.The mapping displays of the poor families were shown in Figure 9 and Figure 10. Figure 9 shows the location mapping of the poor families for all categories of poverty in a certain area.This mapping information will describe the potential of the existing poverty in a certain region.Figure 10 provides detailed information about a poor family that includes Family Identification Number, Name of the head of the family, the home location and home photos of poor families.d) The implementation of Naive Bayes Classifier algorithm built use Java used by the decision makers who is in Kabupaten Bantul.

Fig. 3 .
Fig. 3. Training Fase Naive Bayes Figure 3, showed the step-by-step process of Naïve Bayes Classifier algorithm which included reading the data sets.The display of training data is shown in Figure 4.

Fig. 4 .
Fig. 4. Data Training Display The training input from the indicators was Yes and No category which can be seen in Figure 4. Input yes represent value a score largest while value not represent value a score smallest (see.indicator score in table1).Whereas for the testing process can be shown in Figure5.The data used for testing were as much as 20% which was approximately 44 data from 219 data.

Fig. 10 .
Fig. 10.Detail Information Poor FamilyThis detailed data will provide benefits for decision makers in providing aid or poverty reduction solutions.V. CONCLUSIONFrom the explanation that is in chapter before it can be taken conclusion on the results of the study among other: a) Method Naive Bayes Classifier can do classifications the determination of their position in the family poor with the accuracy 93%.b) Classifications produced on data testing only two classifications that are 25 poor and 19 prone to poor, where 41 data recognizable by right and 3 data could not identified.