Analysis of Factors Influencing the COVID-19 Mortality Rate in Indonesia using Zero Inflated Negative Binomial Model

—This research aims to create a model, analyze the factors that influence the COVID-19 mortality rate in Indonesia. There are five independent variables and one dependent variable used in the research. The independent variables used are the percentage of poor people, the percentage of households using shared toilet facilities, the percentage of households using wood as the main fuel for cooking, the percentage of the population whose drinking water source comes from pumped water and the percentage of population who have health insurance from private insurance. While the dependent variable used is the Annual Parasite incidence COVID-19. The results obtained are as follows. First, a Zero-Inflated Negative Binomial regression model was obtained for the case of COVID-19 morbidity where this model could overcome overdispersion and excess zero values in observations. Second, there are 4 independent variables that have a significant effect on the count model and there is no independent variable that has a significant effect on the Zero inflation model. Third, a web application is produced that can display the Zero-Inflated Negative Binomial regression model (ZINB).


I. INTRODUCTION
Health is very important for every human being. With a healthy body, the soul will be good, and the mind will be in balance. Having a healthy body and soul can support human activities without any obstacles. Steps that need to be taken to maintain health include exercising, getting appropriate nutrition coverage and guarding yourself from habits that damage the body. Public awareness is needed to carry out a healthy lifestyle to avoid disease. COVID-19 has become a health emergency in the past one month since the first case in Indonesia was reported in Depok in March 2020. The problem that Indonesia is currently facing is that the government is still not effective in conducting testing and COVID-19 cases and there is no enforcement of rules regarding social distancing and mobility [1].
Research on the analysis of factors that influence COVID-19 in Indonesia is carried out using several different methods. The Zero-Inflated method used by the researcher in this research is the Zero-Inflated Negative Binomial method. This research was conducted to model the COVID-19 mortality rate using the web-based Zero-Inflated Negative Binomial method. The Zero-Inflated Negative Binomial method is used because there is overdispersion in the Poisson regression model and the data used has an excessive value of 0 [2]. Overdispersion is a condition where the value of the variance is greater than the average [3]. One of the causes of overdispersion is the number of observations that are zero in the dependent variable or variable Y. Zero-Inflated Negative Binomial Regression (ZINB) is a model formed from a mixed distribution of Poisson gamma [3]. If is a discrete independent random variable with i= 1, 2, n the null value of the observation is assumed to appear in two appropriate ways for separate states. The first state called zero state occurs with probability pi and produces only zero observations, while the second state is called Negative Binomial state occurs with probability (1 -pi) and has a Negative Binomial distribution with mean μ, with 0 ≤ pi ≤ 1 [4]. Estimated ZINB regression parameters using the Maximum Likelihood Estimation (MLE) method with the EM (Expectation Maximization) and Newton Raphson Algorithm procedures [5]. This method is usually used to estimate the parameters of a model whose density function is known.
In research conducted concluded that the ZINB regression model is more appropriate to be used to model data on the number of maternal deaths in Bali Province which contains many zero values and experiences overdispersion [9]. In another research conducted and concluded that the Poisson regression model does not meet the overdispersion hypothesis, so another model is used, the proposed model is the Zero-Inflated Poisson (ZIP) model, but there is still overdispersion on the ZIP models [10].
To overcome this problem, the ZINB model and the hurdle negative binomial (HNB) model are used [11]. The Akaike Information Criterion (AIC) value of the ZINB model is lower than the value of the HNB model [12]. This shows that the ZINB model is best used in data on the incidence of diptheria in Indonesia [13]. The HNB model can control zero values and overdispersion, just like the ZINB model [14]. However, in the data on the incidence of diphtheria in Indonesia, the ZINB model is more suitable to control the value and over-dispersion of the data concludes that the appropriate model for the frequency of traveling during the last six months in South Tapanuli regency, North Sumatra, for the March 2016 period is the Zero-Inflated Negative Binomial model [15]. www.ijacsa.thesai.org The independent variables used in this research were the percentage of poor population, the percentage of households using shared toilet facilities, the percentage of the population whose drinking water source came from pump water and the percentage of the population who had health insurance from private insurance. The dependent variable used in this research is the COVID-19 mortality rate. Based on the data obtained, there are areas where the COVID-19 mortality rate is zero. This causes the data used to be overdispersion or the average value and variance are different. Therefore, the formulation of the problem in this research is: 1) Create a Zero-Inflated Negative Binomial model for the COVID-19 mortality rate in Indonesia.
2) Analyzing the Factors Affecting the COVID-19 mortality rate in Indonesia.

A. Research Variable
The type of data used in this research is secondary data. The variables used in this research consisted of the dependent variable and the independent variable (can be seen in Table I). Measurement scale of the variables is ratio.

Percentage of poor people Independent
Population whose average monthly per capita expenditure is below the poverty line. The unit used is Percent (%).
Percentage of households that use shared toilet facilities Independent MCK stands for bathing, washing, and toileting is one of the public facilities that are shared by several families for the purpose of bathing, washing, and defecating in certain residential locations which are considered to have a fairly dense population and low level of economic capacity. The unit used is Percent (%).
Percentage of households that use wood as the main fuel for cooking Independent Until now, household energy needs in rural areas are still supported by firewood and agricultural waste. In rural areas, especially in remote areas, people still use more than 60% of their energy needs from firewood or biomass. The unit used is Percent (%).
Percentage of population whose drinking water source comes from pumped water

Independent
Pump water is water that comes from pumping from a water source in the ground, then distributed into existing water pipes in the house or in the water tank. The unit used is Percent (%).
Percentage of population who have health insurance from private insurance.

Independent
Health Insurance is an insurance that provides insurance to the insured to replace any medical expenses which include hospital treatment costs, surgery costs and drug costs. The unit used is Percent (%).

B. Research Steps
The data analysis technique used in this research is descriptive analysis, Poisson regression and Zero-Inflated Negative Binomial. The following is a more detailed explanation of the steps taken by researchers in completing this research.

1) Perform secondary data collection.
2) Perform data processing so that the data used.
3) Determine the type of variable from each variable used.
Conducting descriptive analysis by calculating the Mean, Median and standard deviation values. In addition, it determines the maximum and minimum values of each variable.
The following is an explanation of the steps taken by researchers in making this Zero-Inflated Negative Binomial model (can be seen in Fig. 1 Equation (2) If there is the same rank.   Comparing the value of with the value of _ ( ) .
 Draw conclusions based on the results of the comparison.
2) Determine the estimated regression coefficient by using the Maximum Likelihood Estimation method and Newton Raphson iteration [16].
where the coefficients used are the coefficients obtained is step 3.
where the coefficients used are the coefficients obtained in step 3.

6)
Perform the Overdispersion test on Poisson regression with the following steps [17]:  Specifies and .
 Determine the value of Deviance ( ) and Pearson Chi-Squared ( ).
 Compares the values of and with 1.
 Draw conclusions based on the results of the comparison.

7)
If Overdispersion is found, then the model made is Zero Inflated Negative Binomial.

C. Analyzing
After the previous steps have been taken, the steps taken to analyze the factors that influence the COVID-19 mortality rate are as follows: 1) Determine the estimated regression coefficient by using the Maximum Likelihood Estimation method and Newton Raphson iteration.
p : number of independent variables.
n : number of observations.
3) Conducting the test simultaneously with the following steps: a) Specifies and . b) Calculating the value of the G test statistic according to Equation (5).
The test criteria is reject if W > ( ) Denial at the level of significance means that a certain j-th independent variable has a significant contribution to the dependent variable Y.

III. RESULT AND DISCUSSION
In this research, a web-based application was made. Users must register their personal data first as a complete account. After the account has been successfully created, the user will be directed to the Login page to input the registered Username and Password. If the Username and Password match, the User can access the features contained in the application. After successful login, the user will be directed to the home menu. In the home menu there is a Navigation bar that contains the features found in the application. Users can see the results of predictions of mortality rates that have been done before or Users can make new interpretations of COVID-19 mortality rates based on existing variables. After the user has finished using the application, the user can log out of the account by pressing the logout button. Use cases in this research can be seen in Fig. 2.

A. Correlation Analysis
The independent variables tested were the percentage of poor people ( 1 ), the percentage of households using shared toilet facilities ( 2 ), the percentage of households using wood as the main fuel for cooking ( 3 ), the percentage of the population whose drinking water source came from pumps ( 4 ) , the percentage of the population who had health insurance from Private Insurance ( 5 ) , Area Height ( 6 ) , Number of Hospitals ( 7 ), Number of Doctors ( 8 ) and Percentage of Households with Ground Floor ( 9 ) . For example, the correlation value of X1 and Y is -0.4570287, then the interpretation of the correlation value is that the relationship between the percentage of the poor and the COVID-19 mortality rate is in the moderate category and contradicts each other where if the percentage of the poor increases, then the mortality rate will decrease (can be seen in Table II).
Based on Table I, the correlation value of several independent variables is greater than (0.05,29) which means that several independent variables have a significant effect on the dependent variable. The influential variable is Percentage of poor people ( 1 ), the percentage of households using shared toilet facilities ( 2 ), the percentage of households using wood as the main fuel for cooking ( 3 ), the percentage of the population whose drinking water source came from pumps ( 4 ), and the percentage of the population who had health insurance from Private Insurance ( 5 ). Therefore, the variables Area Height ( 6 ), Number of Hospitals ( 7 ), Number of Doctors ( 8 ) and Percentage of Households with Ground Floor ( 9 ) are excluded from the modeling to be carried out.

B. Poisson Distributed Data Test
From the test, the results of the test statistic value of 0.4286. These results were compared with the statistical value of the Kolmogorov-Smirnov test (0.05;7) which was worth 0.48343. Because the statistical value of the test is smaller than the statistical value of the Kolmogorov-Smirnov test, it can be concluded that 0 a failed to reject or the data came from a population that followed the Poisson distribution. Based on the conclusion, the regression model that will be made is Poisson regression modeling.   Table III shows the coefficient values, standard error, test statistics and conclusions from each Poisson regression variable. The influential variable is Percentage of poor people ( 1 ), the percentage of households using shared toilet facilities ( 2 ), the percentage of households using wood as the main fuel for cooking ( 3 ), the percentage of the population whose drinking water source came from pumps ( 4 ), and the percentage of the population who had health insurance from Private Insurance ( 5 ). The model formed based on the values shown in Table III

C. Zero-Inflated Negative Binomial Modeling
The Zero-Inflated Negative Binomial (ZINB) regression model is a regression model that can be used to model data with the dependent variable having a Poisson distribution, many observations that are zero in the dependent variable and overdispersion occurs.

D. Best Model Selection
The best model selection is done by looking at the AIC (Akaike Information Criterion) value. The selection of the best model is done by comparing the 2 models that have been formed, namely the Poisson Regression model and the Zero-Inflated Negative Binomial Model. The AIC values of the two models can be seen in Table VI. The AIC value in Table VI shows that the lowest AIC value is the Zero-Inflated Negative Binomial model without a Variable Percentage of the population who has health insurance from private insurance. These variables were excluded because they had no significant effect on the Count Model and Zero-Inflation Model. Therefore, the Zero-Inflated Negative Binomial modeling was carried out without using the Variable Percentage of the population who had health insurance from private insurance.

E. Model Interpretation
The ZINB model is used to deal with overdispersion in the Poisson Regression model. The ZINB model is divided into two components, namely the count model for and the zero inflation model for. The interpretation of the model formed from ZINB is based on the odd ratio value as seen from the exp (β) value.
1) The interpretation of the count model coefficient is as follows : a) The constant is 2.13703, meaning that if the variables are Percentage of poor people , Percentage of households using shared MCK facilities ( ), Percentage of households using wood as the main fuel for cooking ( ), and Percentage of population whose drinking water source comes from Pump Water (( ), is zero, then the COVID-19 mortality rate is worth exp( 2.13703) = 8.474232.
b) The coefficient of is 0.52405, meaning that every 1 percent increase in the percentage of poor people ( ), will increase the COVID-19 mortality rate by exp (0.52405) = 1.688854 times the original COVID-19 mortality rate, if other variables are constant.
c) The coefficient of is -0.26860, meaning that every 1 percent increase in the percentage of households that use wood as the main fuel for cooking ( ), it will reduce the COVID-19 mortality rate by exp (-0.26860) = 0.764449 times the original COVID-19 mortality rate, if the variable is else constant value.
d) The coefficient is 0.15242, meaning that every 1 percent increase in the Percentage of the Population whose drinking water source comes from Pump Water ( ) will increase the COVID-19 mortality rate by exp (0.15242) = 1.164649 times the original COVID-19 mortality rate, if other variables are constant.
2) The interpretation of the zero inflation model coefficient is as follows: a) The constant is -27.395 , meaning that if the variables are Percentage of poor people ( ), Percentage of households using shared MCK facilities ( ), Percentage of households using wood as the main fuel for cooking ( ), and Percentage of population whose drinking water source comes from from the Air Pump ( ), is zero, then the value of the COVID-19 mortality rate is exp( -27.395) = 1.266E-12.
b) The coefficient of is 7.081, meaning that every 1 percent increase in the percentage of poor people ( ), will increase the chance of the COVID-19 mortality rate to zero by exp (7.081) = 1189,157 times, if other variables are constant.
c) The coefficient of is -17,144, meaning that every 1 percent increase in the percentage of households using shared MCK facilities ( ), will reduce the chance of the COVID-19 mortality rate to zero by exp (-17,144) = 3.58E-08 times, if other variables are constant.
d) The coefficient of is -2.119, meaning that every 1 percent increase in the percentage of households that use wood as the main fuel for cooking ( ), it will reduce the chance of the COVID-19 mortality rate to zero by exp (-2.119) = 0.1201 times, if other variables are worth constant.
e) The coefficient of , is -9.382, meaning that every 1 percent increase in the Percentage of the Population whose drinking water source comes from Pump Water ( ) will reduce the chance of the COVID-19 mortality rate to zero by exp (-9.382) = 8.42E-05 times, if other variables are worth constant.

F. Parameter Significance Test Results
Based on Table VII, it can be concluded that in the count model there are 4 variables that have a significant effect on the COVID-19 mortality rate. In the zero-inflation model, there are no independent variables that affect the COVID-19 mortality rate. Based on the two models, it can be concluded that the variables used are not appropriate for the zero-inflation model. www.ijacsa.thesai.org

Zero-Inflation Coefficient
Percentage of poor people ( ) Significant Not significant Percentage of households using shared toilet facilities ( ) Significant Not significant Percentage of households using wood as the main fuel for cooking ( ) Significant Not significant Percentage of Population whose drinking water source comes from Pump Water ( ) Significant Not significant

IV. CONCLUSION
The conclusions obtained from this research are as follows the factors that influence the COVID-19 mortality rate in the count model are percentage of poor people ( 1 ), the percentage of households using shared toilet facilities ( 2 ), the percentage of households using wood as the main fuel for cooking ( 3 ), and the percentage of the population whose drinking water source came from pumps ( 4 ). In the Zero-Inflation model, there are no factors that affect the COVID-19 mortality rate. so that the ZINB regression model used is the count model. Third, based on the evaluation of user satisfaction, the designed application has been able to help predict COVID-19 mortality and assist in providing information and insight to the public about COVID-19.