AI in Tourism: Leveraging Machine Learning in Predicting Tourist Arrivals in Philippines using Artificial Neural Network

—Tourism is one of the most prominent and rapidly expanding sectors that contribute significantly to the growth of a country’s economy. However, the tourism industry has been most adversely affected during the coronavirus pandemic. Thus, a reliable and accurate time series prediction of tourist arrivals is necessary in making decisions and strategies to develop the competitiveness and economic growth of the tourism industry. In this sense, this research aims to examine the predictive capability of artificial neural networks model, a popular machine learning technique, using the actual tourism statistics of the Philippines from 2008-2022. The model was trained using three distinct data compositions and was evaluated utilizing different time series evaluation metrics, to identify the factors affecting the model performance and determine its accuracy in predicting arrivals. The findings revealed that the ANN model is reliable in predicting tourist arrivals, with an R -squared value and MAPE of 0.926 and 13.9%, respectively . Furthermore, it was determined that adding training sets that contain the unexpected phenomenon, like COVID-19 pandemic, increased the prediction model's accuracy and learning process. As the technique proves it prediction accuracy, it would be a useful tool for the government, tourism stakeholders, and investors among others, to enhance strategic and investment decisions.


I. INTRODUCTION
Big data has been utilized by many academics and industry professionals to make their own predictions and forecasting in a variety of fields, including the travel and tourism industries. Tourism is one of the most prominent and rapidly expanding sectors that contribute significantly to the growth of a country's economy. In the Philippines, travel and tourism contributed 12.7%, 5.4%, and 5.2% in the Gross Domestic Product (GDP) in 2019, 2020, and 2021, respectively [1]. This year, the tourism sector achieved another remarkable recognition when it won three significant awards at the 29th World Travel Awards (WTA). According to [2], the country was named "Asia's top tourist attraction," "dive destination," and "top beach destination" among Asian countries. The [3] names dive tourism as one of the important industries that can have a favorable impact on industry growth in terms of more visitors, longer stays, and increased tourism revenue.
However, the direct gross value contributed of the tourism industry has decreased by almost 50% over the last two years as a result of the closing of international borders and national lockdowns.
Thus, a reliable and accurate prediction of tourist arrival is necessary in making decisions and strategies to develop the competitiveness and economic growth of the tourism industry.
In the previous study of [4], they investigated the connection between the tourist arrivals from the quantity of COVID-19 cases during summer. The analysis uses three models: the simple linear model, the negative binomial regression model, and the Cognitive Artificial Neural Network (ANN) model. The findings showed that tourism is a significant factor of the increase in Covid-19 cases. The researchers also concluded that the ANN model can make the most accurate predictions among the three models.
The random forest (RF), artificial neural networks (ANN) [5], and support vector machine are three of the most popular AI-based models for time-series prediction. However, many publications, including [6,7,8], claim that there is no one technique that consistently delivers the best tourism forecasts; rather, the results rely on the model and technique applied, the amount of observations, features of the data set, and predicted duration.
This study's objective is to forecast future tourist arrivals using historical data rather than to examine demand for tourist attractions. As has already been mentioned, seasonal timesseries forecasting is crucial for making strategic decisions and organizing upcoming tasks. The researchers want to raise concerns about neural network models' capability for predicting seasonal tourist arrivals, which is a challenge that shows frequently in a variety of applications. In addition, this study aims to identify the factors affecting the model performance and determine its accuracy in tourist arrivals' prediction.
The following are the primary contributions of this study:  An exploratory data analysis that delivers more meaningful data through visualization using a dataset that includes information about Philippine tourist arrivals.
 In contrast to other studies that concentrated on and employed a single data composition in training and testing sets, the researchers explore with several data  Leveraging machine learning, particularly artificial neural networks model in alternative with other popular data mining techniques like random forest and support vector machine, in predicting tourist arrivals in the Philippines and perform model performance evaluation using various time series evaluation metrics.

II. RELATED WORKS
Machine learning and data science researchers have been working on time series forecasting. There is a growing need for precise and effective forecasting techniques as time series data from numerous industries, including banking, healthcare, and energy, become more widely available. In this study, the researchers highlight a few recent developments in time series forecasting and talk about their advantages and disadvantages.
Deep learning is one of the most well-liked methods for time series forecasting and has demonstrated promising outcomes in a number of fields. In order to capture long-term relationships in the data, [9] introduced a unique deep learning architecture for time series forecasting dubbed Longformer-TS. This design makes use of the self-attention mechanism. On numerous benchmark datasets, they showed that Longformer-TS outperformed a number of cutting-edge models.
The introduction of probabilistic forecasting techniques, which offer a distribution of potential future values rather than a single point estimate, is another new development in time series forecasting. This might be helpful when making decisions that call for quantifying uncertainty and assessing risk. In order to model the uncertainty in the data, [10] for instance suggested a deep probabilistic forecasting framework that makes use of a Bayesian neural network. On real-world electrical load forecasting tests, they demonstrated that their system beat numerous conventional and deep learning approaches.
There has been an increase in interest in applying metalearning techniques for time series forecasting in addition to deep learning and probabilistic approaches. By utilizing prior experience on related tasks, meta-learning seeks to discover the best algorithm or hyperparameters for a current task. Using a neural network to train a set of hyperparameters that can adapt to various forms of time series data, [11] suggested a metalearning framework for time series forecasting, for instance. On a number of benchmark datasets, they showed that their method delivered state-of-the-art performance.
For predicting tourist arrivals, [12] introduced a neural network model termed the Deep Travel Demand Forecasting Network (DTDNet). A prediction module and a feature learning module make up the model's two primary parts. While the prediction module employs a Long Short-Term Memory (LSTM) network to identify temporal connections and generate predictions, the feature learning module employs a deep convolutional neural network (CNN) to extract pertinent features from the raw data. On three tourism datasets from various areas, the authors examined the DTDNet model and found that it beat other cutting-edge models, including ARIMA, SARIMA, and VAR.
A hybrid neural network model that combines a convolutional neural network (CNN) and a recurrent neural network (RNN) for forecasting tourist arrivals was proposed by [13]. The data is processed using the CNN to extract spatial features, and the RNN to identify temporal connections. The model was tested on a Chinese tourism dataset, and the results demonstrated that it beat numerous conventional models like ARIMA and exponential smoothing.
Another study, [14] suggested a neural network model for anticipating tourist arrivals termed the Multivariate Attentionbased Temporal Convolutional Network (MATCN). The model combines a temporal convolutional network (TCN) and an attention mechanism to identify both short-and long-term dependencies in the input. On a Thai tourist dataset, the authors tested the MATCN model and found that it performed better than numerous conventional models, including ARIMA and Holt-Winters.

III. METHODOLOGY
Datasets for the study are mainly collected from three Philippines government organizations, i.e., the Tourism Department and Development Planning of Department of Tourism, Research and Information Management, and Statistics, Economic Analysis, and Information Management Division. The data collected from Philippines's tourism demand statistics include the annual number of inbound tourists' arrivals to Philippines from January 2008 to October 2022 [15].

A. Descriptive Analysis
Inbound travel to the Philippines increased dramatically between 2008 and 2021 ( Fig. 1). From 3.14 million in 2008 to 8.26 million in 2019, there were more than twice as many foreign and overseas Filipino tourists visiting the country [16].
Despite the fact that the graph indicates an increase in inbound tourists' arrivals to the Philippines over the past years, as forecast indicate, there were notable drops in tourist arrival in 2009, 2014 and 2020. In 2009, there was a decrease in the number of arrivals by 1.5% compared to 2008 [17]. Several upsetting incidents occurred in the global tourism industry in 2014, such as the emergence of SARS, avian influenza, Ebola, and MERS-CoV infections [18]. Evidently, a sudden decline in tourist arrivals and subsequent drop in tourism demand of the country caused by coronavirus pandemic in the year 2020, which led to the loss of millions of jobs, severe economic hardship, and the demise of many enterprises [19]. Several jobs and businesses, particularly the micro small, and medium-sized ones that catered to tourists or were related industries, were at risk due to the border closures, main entry points, and hotels as well as mass gatherings restrictions, land travel, and related services worldwide. www.ijacsa.thesai.org Travel and tourism competitiveness in the Philippines continues to rise globally, continuing a trend that has been present for the past eleven years [20]. In 2019, 8.3 million tourists (including non-resident overseas Filipinos) visited the country, primarily from Korea, China, the USA, Japan, and Taiwan. These made up 70% of all visitors during the year. Tourist arrivals in the Philippines, however, fell by 80% in the first quarter of 2020 and by about 90% in 2021 for both foreign and overseas Filipinos, these are the periods when most countries started implementing travel restrictions and lockdowns_ [21]. Fig. 2 shows the adapted and modified research framework of the study [22]. It involves collection and preprocessing of data, feature extraction, model development, and its performance evaluation. The researchers first gathered tourist arrival statistics from the Department of Tourism's website. Then, the data was preprocessed in the second phase to extract valuable and target information. Third, train and test the time series machine learning model, and finally, evaluate the model performance.

1) Data collection:
To perform this study, the researchers collected the actual inbound tourist arrivals to Philippines between 2008-2022 from the Department of Tourism's official website. The datasets were analyzed and visualized using exploratory data analysis in Python programming. This was done to determine what the tourism data could reveal beyond the formal modeling [23]. Table I presents    2) Data preparation: Two temporal features were extracted in the preprocessing stage: month and year. The extracted data was pre-processed using Long-Short-Term Memory (LSTM) as the extraction feature technique, capturing the complex dependencies between the past and current tourist arrivals. The study conducted the time series analysis following the theory of Autoregressive model (Equation 1). The model uses observations to predict a value based on prior time steps as an input. The order of the model is determined by the number of samples utilized for prediction, n r [24]. [ (1) AR model was translated into a neural network, where there is a need of inputs that are lagged. In order to that, the data set was pre-processed, in such as a way that input values was created as lagged values. From the original dataset having a month and tourist arrivals column, the last 12 months was taken from the seasonal time series of the dataset and was used as lags. It will be used as inputs to predict the current month. The first twelve months, January through December 2008, were eliminated, and a new dataset that began in January 2009 and ended in October 2022 was established with twelve additional dummy variables. Then, a traditional neural network was formed, where there are twelve features and the target variable which is the number of tourist arrivals.

3) Model development:
To build and develop the model, train and test sets were split into two parts. The criteria for data decomposition in this study is the trend where the tourist arrivals show an upward and downward movements, particularly during the COVID-19 pandemic. In the training process, the segments of the training dataset were decomposed into three partitions such as: (a) January 2008 -January 2020 with 82% as the training data, and 18% as the testing data, (b) January 2008 -March 2020 with 83% as the training data, and 17% as the testing data, and (c) January 2008 -December 2020 with 88% as the training data, and 12% as the testing data (Fig. 3):  January 2020 (the period when the government confirmed the first case of new coronavirus) In the middle of January 2020, the Philippines reported its first coronavirus case [25] [26].

 March 2020 (the period when the government implemented suspension of arrivals)
According to the Bureau of Immigration (BI), entry into the Philippines is prohibited as of March 22, 2020, with the exception of foreign spouses and children of Filipino citizens, diplomats, international organizations' employees and officials [27]. The prohibition on travelers from 20 countries was made public by the Bureau of Immigration (BI) on December 29, 2020, as an additional step to stop the spread of the COVID-19 virus's alleged new strain [28].
The datasets were loaded into Orange Data Mining tool for time series prediction using artificial neural network model based on the architecture of a MLP network. Orange is an open-source data mining toolkit which provides a platform for data visualization and predictive modeling [29].
The researchers adapted the architecture of multi-layer perceptron (MLP), which serve as a supplement to the feed forward neural network. As depicted in Fig. 4, it has three distinct types of layers such as input, hidden, and output layers; receiving the input signal for processing takes place at the input layer. The output layer completes the necessary task, such as classification and prediction [30].
The multilayer perceptron is a feedforward-style design that is based on the perceptron neuron model, which is one of the most popular topologies for forecasting time series [31]. www.ijacsa.thesai.org

4) Model evaluation:
The trained network's predictions were compared against the test forecast set to assess the error. On the test dataset, the network was then simulated.
Evaluation of the Artificial Neural Networks model with three distinct data composition is shown in this section. Each model's prediction accuracy was assessed using mean absolute percentage error (MAPE: Equation (2)), coefficient of determination (R 2 : Equation (3)), mean absolute error (MAE: Equation (4)), and root mean squared error (RMSE: Equation (5)) [32,33]. The MAPE is used to check how close estimates or forecasts are to actual values. Both RMSE and MAE calculate the size of a set of predictions' errors. The primary purpose of the R 2 is to assess how similar the predicted and actual time series are [34].
The MSE, RMSE, MAE, MAPE and R 2 determined with the predicted values over the test set are presented for the three data compositions. Furthermore, the selections are based on the best R 2 and lowest MAPE because it shows whether or not the model is a good fit for the observed values, as well as how good of a fit is.

IV. RESULTS AND DISCUSSIONS
Considering the results presented in Table II, the adapted Artificial Neural Networks (ANN) model with the third data composition was the best performing model for predicting tourist arrivals across different data compositions, which clearly shows that ANN is a reliable model in time series prediction [35].
The results show that the prediction models trained with the third training set have the highest coefficient of determination of 0.926 and the lowest MAPE with 13.9%, meaning the forecasted value is closer to the actual one [36]. These results show that the models that were trained with enough data to cover unexpected events like corona virus pandemic, will improve the model accuracy. [37] As the last study pointed out, researchers need to come up with predicting models that can take into account of unplanned scenarios [38]. Overall, the best predictor is the ANN model that uses the data composition which consists of the data of COVID-19 pandemic period.   The blue line represents the actual tourist arrival, while the predicted value is represented by a red line. Similar to the result in the study of [39], the outputs portrayed in Fig. 7 confirm the competence of ANN model in predicting tourist arrivals in the Philippines. Another researcher conforms that the ANN model has the capacity to outperform other time series machine learning techniques, like ARIMA models [40,41], Box-Jenkins and Exponential Smoothing models [42].

V. CONCLUSION
This study compares three distinct data composition in training data sets of Philippines' tourism demand from 2008-2022 using Artificial Neural Network model. The first training set consists of the period when the government confirmed the first case of new coronavirus; the second set is the period when the government implemented suspension of arrivals, and finally, the period when the restrictions to main entry point amid new COVID-19 strain was implemented by the government. To determine the best model and to see whether or not the model is a good fit for the observed values, as well as how good of a fit is in predicting and forecasting the arrivals, it was evaluated utilizing different time series evaluation metrics namely, mean absolute percentage error (MAPE), coefficient of determination (R 2 ), mean absolute error (MAE), and root mean squared error (RMSE).
The findings showed that the ANN model is reliable in predicting tourist arrivals, with an R-squared value and MAPE of 0.926 and 13.9%, respectively. Furthermore, it was determined that adding training sets that contain the unexpected phenomenon, like COVID-19 pandemic, increased the prediction model's accuracy and learning process. As the technique proves it prediction accuracy, it would be a useful tool for the government, tourism stakeholders, and investors among others, to enhance strategic and investment decisions.

VI. FUTURE WORK
To further the study, a combination of neural networks with fuzzy logic or other time series forecasting models for more reliable and accurate results. In addition, other external factors like online forums, reviews in travel Apps, and social media posts, can be added to further enrich the dataset.