Predicting Number of Hospital Appointments When No Data Is Available

Usually, in a hospital, the data generated by each department or section is treated in isolation, believing that there is no relationship between them. It is thought that while one department is in high demand, it can not influence that another may have the same demand or not have any demand. In this paper, we question this approach by considering information from departments as components of a large system in the hospital. Thus, we present an algorithm to predict the appointments of departments when data is not available using data from other departments. This algorithm uses a model based on multiple linear regression using a correlation matrix to measure the relationship between the departments with different time windows. After running our algorithm for different time windows and departments, we experimentally find that while we increase the extension of a time window and learn dependencies in the data, its corresponding precision decreases. Indeed, a month of data is the minimum sweet spot to leverage information from other departments and still provide accurate predictions. These results are important to develop per-department health policies under limited data, an interesting problem that we plan to investigate in future works. Keywords—Multi linear Regression; hospital appointments; machine learning; correlation matrix


I. INTRODUCTION
Usually, hospital data is treated as a single entity in which appointment information, resource management, and activities are treated equally. Therefore, when the hospital offers the number of appointments available for the next month just analyze the historical data for the same department without considering the data of other departments since it is known they don't have any type of relationship between them.
There have been several studies documenting the various aspects of non-attendance in hospitals to improve the scheduling of available appointments every month just using the historical data in each department.
In this paper, we question this approach by considering information from departments as components of a large system. By doing this, we take advantage of the particular dynamics between departments that explain the system behaviour. Particularly, we observe that not all department data has the same data availability but they complement each other over time. For example, before a mother gives labor, the Emergency department shows busy schedules, but when the labor is done, usage of the Pediatrics department shows more intensity. Knowing how the different departments of a hospital interact allow us to predict the number of appointments for a target department because the other departments can explain their behavior over time. In other words, when one department does not have information, we still want to be able to make predictions using the data from other departments. [1] found that is possible to use high-dimensional models varying in complexity based in logistic regression. The models were trained and evaluated achieving a good performance in the prediction of schedule hospital attendance.
In this paper, we experimentally confirm that it is possible to predict appointments for the next month under no available data for a target department, a critical problem in most hospitals. When the time window goes beyond a month, the predictions are not reliable and some appointment information for the target department is needed.
The rest of the article is organized as follows: Section 2 describes the related works proposed in the literature. Section 3 summarizes some previous concepts needed to understand the proposed prediction model. Section 4 explains the general scheme of the proposed model. Section 5 describes the results of applying the proposed prediction model to a health appointments dataset collected at a hospital. Finally, Section 6 concludes our work and presents some future extensions of the proposed model.

II. RELATED WORKS
Nelson A. et al. [1] found that in the United Kingdom the cost "no-show" for appointments, where the patients did not present, is around 1 000 000 000 annuals. Their purpose was to find the relation between predictive models and predictive features for "no-show" appointments. They got data from University College Hospital and Neurology and Neurosurgery National Hospital, these data had a cleaning process, after was divided into three groups: for training, validation and test for a neural network to predict "no-show" appointments. Their investigation showed that an optimal schedule appointment requires high-dimensional models based on machine learning. Dashtban and Li [2] explain that "no-show" appointments drive to worst attention for patients, inefficient use of human resources, and an increase in waiting time. They wanted to predict the behavior's patients finding common factors. They used SSDAE (Sparse Stacked Denoising Autoencoders) for rebuilding missing data, and added a layer for making predictions, the oldest data was used for training and newest for validation. Their model surpassed other models that were compared.
Kyambille and Kalegele [3] said that Tanzania's patients complain for the time to go to the hospital and be attended, they developed a mobile application to manage appointments, they hope this application reduces the waiting time in patients.
Mieloszyk [4] faces the problem that "no-show" appointments do not allow rescheduling to other patients in this space. Their objective is to develop a system to collect data of appointments, these data were classified into three groups: relational to a patient, exam, and appointment schedule; over it, they used linear regression.
Tenagyei and Kwadwo [5] focus on the manual schedule appointments and the problems it carries on. They developed a system where patients and doctors can schedule appointments, balancing the patient's charge in doctors.
Mazurowski and Maciej [6] increasing class imbalance in the training dataset generally has a progressively detrimental effect on the classifier test performance measured by AUC and 0.9 AUC. This is true for small and moderate size training datasets that contain either uncorrelated or correlated features. In the majority of the analyzed scenarios backpropagation provided better results. The training was more susceptible to factors such as class imbalance, small training sample size, and a large number of features. Again, this finding was true for both correlated and uncorrelated features.
Class imbalance is a common problem with most medical datasets [7]. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced. Sampling strategies have been used to overcome the class imbalance problem by either oversampling or under-sampling. Many researchers proposed different methods of over-sampling and under-sampling the majority class sample to balance the data.
Zia, Uswa Ali, and Naeem Khan. [8] they used classification algorithms Naïve Bayes, Decision Trees, and kNN for prediction diabetes. The result obtained from this study is compared with the similar study of other authors. From the comparison table, we have noticed the decision trees work better than others. The decision tree algorithms i.e. J48 and Jgraft outperform other classifiers and previous studies. It achieves the highest accuracy rate of 94.44. The decision tree is simple and a good classifier for predicting diabetes.
Data Mining is gaining its popularity in almost all applications of the real world. One of the data mining techniques i.e., classification is an interesting topic to the researchers as it accurately and efficiently classifies the data for knowledge discovery. Decision trees are so popular because they produce human-readable classification rules and are easier to interpret than other classification methods. Frequently used decision tree classifiers are studied and the experiments are conducted to find the best classifier for Medical Diagnosis. The experimental results show that CART is the best algorithm for the classification of medical data. It is also observed that CART performs well for classification on medical data sets of increased size [9].
In [10] used two large datasets, they found that DNA rates for medical appointments declined monotonically over the week. This pattern was present for both male and female patients and in all age groups but was stronger in younger age groups. Importantly, it also generalized across national hospital and single practice settings. In line with their predictions, attendance was systematically higher on days that elicit emotionally positive associations (e.g. Friday), and lower on days that elicit emotionally negative associations (e.g. Monday). These findings raise the possibility that medical appointments may be harder to face on some weekdays than on others.
Green, Linda V., and Sergei Savin. [11] Health care practices are increasingly competing not only on cost but also on quality and patient satisfaction. In this environment, timely access to care has become a more important issue. As a result, physician practices are eager to embrace new approaches to patient appointment scheduling to reduce backlogs, increase productivity, and improve patient satisfaction. They have demonstrated that the cancellation factor and its associated rescheduling probability have a significant impact on system performance and on the maximum patient panel size that can be reasonably handled by a practice. While no model is a perfect representation of reality, they believe that these are useful for guiding patient panel decisions because they capture the essential dynamics of a patient appointment system.
Almuhaideb [12] talks about the "no-show" appointments as a global problem. The data collected was from the year 2014 and used the trees JRip17 and Hoeffding for model and classified the appointments. The model uses the "noshow" appointments historical to predict future "no-show" appointments for a particular patient, the model generated by JRip 17 has 13 rules and resembles a decision tree.

A. Multiple Linear Regression (MLR)
Regression models are used to describe relationships between variables by fitting a line into the observed data. Regression allows the estimation of how a dependent variable changes in accordance to the changes of an independent variable(s).
Multiple linear Regression, also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of MLR is to model the linear relationship between the explanatory variables and the response variable.
In this study, MLR is used to formulate the problem of predicting the number of appointments for a department using information from the rest of hospital departments when no information is available for that specific department as follows: Where Y represents the number of appointments for the next month and X represents the appointments for other departments. The root mean square deviation or root mean square error is a frequently used measure of the differences between the values predicted by an estimator and the observed values. RMSE is always non-negative, and a value of 0 would indicate a perfect fit to the data. In general, a lower RMSE is better than a higher one.
We used RMSE to measure the precision of the prediction as follows: In every time window, an average of RMSE was calculated by each department. Those values were used to make the comparison when the algorithm is using or not the own department data to make the prediction.

A. Algorithm
The algorithm uses different time windows to generate the RMSE matrix for each department in the defined time windows. In each time window, it extracts data from the matrix to train the model and predict the next value. Thus, our input is a matrix with 41 rows representing the number of departments and 72 columns representing the number of months in 6 years.
Suppose that the time window value is 3. Then, the algorithm begins in the first position which represents January 2014 until March 2014. In each iteration of the time window, the algorithm extracts data from the matrix generating a new sub-matrix. The size of this sub-matrix is defined by the value of the time window. This time window will be advanced by until we reach row number 72, which is the number of rows in the matrix. This matrix will be used to train the model. Once the model is trained a new input needs to be sent that will be used to predict the number of appointments for the next month. The input will be the appointments for April 2014 and the model will predict the appointments for May 2014. For the next iteration, we need to create a different subset of data moving one the time window. Meaning that the input in the next iteration to train the model will be from February 2014 until April 2014 which represents the index 1 -4 respectively.
The next input will be the appointments for May 2014 and the algorithm will predict the appointments for June 2014. This is the main idea of the algorithm that will run for each time window defined. The algorithm flow is shown in figure 1.
In step 1, seven time frames were defined to measure the precision in each of them. The algorithm goes over each time window (1, 3, 6, 12, 15, 18). Then, in step 2 the range for the next iteration was defined to prevent iterating more than 72 times since the data has just 72 rows. That loop goes from zero until total departments less the value of the time window. The third loop in step 3 goes over the 41 departments.
In step 4 generates a matrix using the start variable defined in step 2 and the end variable that was calculated using the The new data frame obtained in step 4 will be used as the input for the Multi linear Regression model in step 5. The algorithm was used in two different scenarios. Thus, the data obtained in step 4 were different for each of them. To predict the number of appointments for a given department using data from all departments including the data from the department to be predicted the input was 41 departments to train the model. For the second scenario where we want to predict the number of appointments for a certain department without using the data of the department that we want to predict the appointments, the entry was 40 departments.
Once the model is trained, in step 6 calculates the prediction for the next month. So, the input for the model was the data of the previous month that we want to predict. The prediction in step 7 will be used to calculate the RMSE between the real value and the value predicted. Those RMSE were saved for each iteration that will be used to get an average and save them in a table where each column represents the time window used and every row represents the departments.
The output of this algorithm is a matrix that shows the average RMSE for all the iterations for each time window. Figure 2 shows the results using all the departments and the figure 3 shows the results without using the data of the department itself.
We observed while the value of the time window is lower the RMSE is pretty much the same in both scenarios. Thus, when the time window is equal to 1, the values in both scenarios are the same. It means that it is possible to predict the appointments for one department when the data is not available for the target department using the data from other

V. RESULTS
A. Experimental Details 1) Data collection: a) Participants: Data for this project was collected from the regional hospital located in Arequipa involving 41 departments for six years. This hospital has provided for more than 90 000 appointments from 2014 to 2019. b) Number of departments: Forty-one departments were extracted and identified. These departments available in the hospital include nursing, pediatrics, gynecology, psychology, and other 37 departments that are included in this study. We noticed that over time in those six years, some departments were opened and closed each year. So, only the common departments over the six years were considered in our analysis. 2) Data processing: Data was extracted from spreadsheets. The original data that the hospital provided us was organized in folders, each folder representing a year, and within a year there was one excel per month. Thus, this data represents the overall hospital information with a total of appointments by that specific month. This data has been collected every month for over six years. The results have the following structure: every column represents the department and every row represents the month. Departments with zero appointments in more than half of the months were removed because that data is not significant for the aim of the study. Finally, the data-set was cleaned to remove some departments that are not included in every year getting 41 columns and 72 rows.

B. Correlation Matrix
In statistics, correlation or dependence is any statistical relationship, whether casual or not, between two random variables. The most familiar measure of dependence between two quantities is the "Pearson product-moment correlation coefficient" commonly called simply "the correlation coefficient". Mathematically, it is defined as the quality of leastsquares fitting the original data.
In our study, two concepts were used: strong relation and weak relation. It is assumed that the weak relationship won't affect the result and the strong relation will have a big impact on the final result. The aim is to determine if only using the departments with a high correlation coefficient will improve the prediction when data from the department itself is not used. Figure 4 shows the coefficients between all the departments. It can be noticed that they are some of them that are pretty related and others don't have any relationships between them. Some experiments were made to use only the departments who had the correlation coefficient less 0.5 and upper 0.5 to confirm that those departments have a strong influence on the results.
After those experiments, we confirmed that using all the departments as inputs will have better results rather than just use the departments with the correlation coefficient defined above.

C. Model Validation
The model was validated by calculating the matrix of square errors for every time frame for each department. Two matrices were generated to see the differences between them since one was generated without using the department itself and the second one was generated including the data of the department. Figure 5 shows the difference of the RMSE in both scenarios that the algorithm was applied. The figure shows that for time window 1 the value is the same for both cases which confirms that it is possible to use the data of all the departments when we do not have data of the own department. The figure also shows that when the time window begins to increase, the values are not the same when the department data is used or is not used for the prediction.   Unlike the other two graphs where we observe that they are different, therefore, the errors will also be different as shown in figure 5 in the columns of 6 and 18 months as time windows. It confirms that when one department does not have information, we still want to be able to make predictions using the data from other departments.

VI. CONCLUSIONS
The information flow in a hospital is dynamic, incomplete, but often correlated. In this paper, we discuss an algorithm to predict the appointments of departments when data is not available. We defined two scenarios to show the differences in RMSE values when the department's data is used and when is department's data is not used. After running our algorithm for different time windows and departments, we experimentally find that while we increase the extension of a time window and learn dependencies in the data, its corresponding precision decreases. Thus, the RMSE values when using the lowest time window are the same in both scenarios. Indeed, a month of data is the minimum sweet spot to leverage information from other departments and still provide accurate predictions since currently a lot of hospitals don't have the data standardized and less organized. These results are important to develop perdepartment health policies under limited data, an interesting problem that we plan to investigate in future works.