Tree-based Machine Learning and Deep Learning in Predicting Investor Intention to Public Private Partnership

—Public private partnership (PPP) is the government initiate in accelerating public infrastructure development growth. However, the scheme exposes private sector to various risks including political risk which in turn affect financial performance and reporting of participating firms. Given that one of the issues facing the government is the lack of participation from the private sector in such arrangements. Thus, the main objective of this study is to observe the machine learning prediction models on private investor intention in participating the PPP program. Tree-based machine learning and deep learning are two different types of promising algorithms, which proven to be useful in widely domain of prediction problems but never been tested on the concerned problem of this study. Based on real data of investors for Indonesian listed firms, this paper presents the ability of the selected machine learning algorithms by means of different assessments point of view. First assessment is on the algorithms’ performances in producing accurate prediction. Second assessment is to identify the variance of PPP attributes in each of the prediction model with the machine learning algorithms. The performance results show that all the prediction models with the machine learning algorithms and the PPP attributes were well-fitted at R squared above 80%. The findings contribute a significant knowledge to various fields of scholars to implement a more in-depth analysis on the machine learning methods and investors’ prediction.


INTRODUCTION
The degree of development of a nation is determined by its capacity to provide for the needs of the general population, including infrastructure like power, internet, trains, roads, ports, and airports. Infrastructure investments may improve livelihoods and well-being since they make people's lives easier and better [1], [2]. However, many countries have several issues and difficulties when constructing public infrastructure. Financial factors, regional factors, demographic concerns, environmental difficulties, and human factors are some of the key causes of infrastructure development delays [3], [4].
In response to the financial constraint, the government has initiated alternative funding programmes for the construction of public infrastructure and invited the private sector to participate. This special arrangement is known as publicprivate partnerships (PPP). PPP has developed in a number of developing countries since it was first used in the United Kingdom in the early 1990s, including Chile in Latin America, Colombia, Brazil, Bulgaria, and Slovenia in Eastern Europe, and Indonesia, Malaysia, and South Korea in Asia [5]- [7]. However, the degree to which PPPs are successfully implemented varies from country to country. For example, in Indonesia, despite the surge in PPP projects for infrastructure development, the private sector's participation is still low as the scheme dominated by Indonesian state-owned enterprise. Hence, this study aims to develop machine learning prediction model on private investor intention in participating the PPP program. This study has important contributions by extending prior works on PPP with non-financial factors to be analyzed based on the machine learning prediction models.
The following section provides a brief description on the data set of this study and the empirical steps of the machine learning research. Section III presents the results of the machine learning performances comparisons based on the PPP attributes. Finally, Section IV discusses the conclusions and future research directions.

II. LITERATURE REVIEW
Prior research that applied machine learning on prediction, classification and detection studies in financial, accounting and finance research highlight the effectiveness and accuracy of such methods to that of traditional statistical methods in problems such as in detection of financial fraud [8], firm performance [9] and finance [10], [11]. Despite the wiser used machine learning in accounting and finance, yet study on machine learning prediction and classification on PPP investment is limited. To date, most of prior studies used machine learning to predict successful PPP projects [12]- [16] that highlighted the return benefits of deploying the intelligent approach for solving various issues of PPP. In [16], Random Forest has been identified as the most outperformed algorithm but the study has observed different aspects of PPP attributes that used in this study. Most of the researchers suggested that more advance research on the machine learning algorithms for Prior studies on the determinants of PPP highlight several financial, economic and environmental factors such as general and operational risk [17], [18], return [18]; operational and cost recovery [19], ownership [20], risk of political instability [21]; risk of exogenous uncertainty [22] and environmental, development impacts and sustainability [20]. However, there are limited studies that have been conducted to examine how non-financial factors such as investor trust in the government, influence the decision of investors to finance public infrastructure through PPP arrangements. To provide a deepen current understanding on the private investor intention in participating in PPP programs, this study attempted to discover non-financial factors including government trust, service quality, transparency, and values of similarity as the determinants of PPP. Unlike the prior works, this paper presents the findings of analysis from the implementation of machine learning prediction models.
To date, more than hundreds of machine learning algorithms can be utilized for various domains of problem such as Decision Tree [23], Random Forest [24], Support Vector Machine [24] and Gradient Decision Trees [23]. Decision Tree, Random Forest and Gradient Decision Trees are categorized as tree-based machine learning. The tree-based machine learning is robust to be used for regression and classification problems. Additionally, Deep Learning [25] is very promising algorithm for prediction problems with massive attributes. However, to study its ability on simple attributes as proposed in this study will provide another useful knowledge to researchers from various fields of interest. The findings will be beneficial to the relevant PPP stakeholders for implementing vast and rapid data-driven recommendations and decisions.

A. Dataset
The machine learning algorithms were tested on dataset that consists of data from 165 top management of Indonesian listed firms. Based on the real collected data, Pearson correlation test was conducted to observe the weights of correlation coefficient each of the PPP attributes as the independent variables (IVs). Fig. 1 presents the PPP attributes correlation coefficient to the dependent variable (DV). The DV in the predictor model is the investor intention to the PPP. The PPP attributes is the investment intention factors namely based on government trust (TrustOnGov), perceived of government service quality (GovServiceQ), perceived of government transparency (GovTransparency), and similarity of values (ValueSimilarity).
Two of the PPP attributes present positive strong correlations (above 0.7 correlation coefficient) to the investor intention while the rest of two attributes have moderate correlation coefficient (0.5-0.6). In the machine learning predictive models, one important matter that need to be observed is the contribution of each attribute in providing knowledge for the machine learning to make prediction. Two questions to be answered in this research are: RQ1: Are all the four PPP attributes usable to each of the machine learning prediction model? RQ2: Which of the attributes are mostly important to all the machine learning models?

B. Machine Learning
Two machine learning from the family of tree-based algorithms namely Random Forest (RF) and Decision Tree (DT) were used in this study. Additionally, acknowledging the wider used of Deep Learning algorithm, it is interesting to observe the algorithm ability in the case of PPP investor intention. Based on preliminary experimental series, the most optimal hyper-parameters of RF and DT is listed in Table I. The range of number of trees for observing the RF accuracy were 20, 60, 100,140. For each combination, the worst error rate produced by RF was 11.6% while the best error rate was 10.5% with number of tress 140 and maximal depth 4. For DT, the range of maximal depth used in the preliminary testing is between 2 and 25. The most optimal value has been achieved by DT when the maximal depth was 4 at error rate 11.8%, slightly lower from the maximal depth value 2(11.9%). A similar value of error rate at 12.8% has been produced when the maximal depth was individually set as 7, 10,15 or 25. The production of prediction generated by Deep learning is depicted in Fig. 2. The number of layers is 4 that used Rectifier function for the input layers and Linear function as the output layer (layer 4).   Table II presents the performances comparison between the three machine learning algorithms by means of Root Mean Square Error (RMSE), R squared (R^) and the time required to complete (TTC) the processes of training and predicting. RMSE measures the accuracy error of the machine learning models while R^ is presenting the proportion of IVs variation in the prediction models. Table II shows that all the predictive models with the PPT attributes have contributed at least 80% of the fitted data from the R^. Random Forest provided the best results in line with the results in [16] but on this cases that used different PPP attributes, the R^ is lower that the results in [16]. Although the average error presented by the Relative Error and RMSE of all models are not very encouraging, the values are within an acceptable range to anticipated that the machine learning can relatively predict the data accurately. All the algorithms are reliable when tested on the PPP dataset as described through the results of small standard deviation ranges.

IV. RESULTS AND DISCUSSION
In term of efficiency, DT has taken the shortest time to complete followed by Deep Learning and RF. Furthermore, it will be useful to look the variance of each PPT attributes in the machine learning predictive models as listed in Table III. Table III all the PPT attributes were contributed some degree of correlations in the prediction models of all the machine learning algorithms. Thus, all the PPP attributes used in this study were used by the machine learning algorithms, than answered the RQ1. Deep Learning received bigger weight of contributions from the trust on government and value of similarity. For all algorithms, trust on government has given a major influence the prediction models at weight above 0.5, which answering the RQ2 of this research. The lowest influence from the PPP attributes can be seen from the government service of quality. Furthermore, the following Fig. 3 and Fig. 4 show the Tree models generated by DT and RF, respectively.  The depth of the tree models from DT and RF is 4 as set at the optimal maximal depth. The trust on government is the deeper attribute before the leaf of both trees that representing it as the important contribution to the model. Although the weights of similarity of value and government transparency are very low in DT and RF, the two attributes are still included in the tree models. Otherwise, government service of quality is utilized in RF but not in DT.

V. CONCLUSIONS
This study aims to explore the non-financial issues that have influenced the investor intention in the public-private partnerships (PPP) based on the evidence from Indonesia. Acknowledging the role of machine learning to provide fast data-driven prediction and how complex is the challenge of future data expanding, this study attempted to discover machine learning techniques and has observed the effect of the pre-determined non-financial factors in the successful of PPP including Government Trust, Similarity of Values, Transparency in Government and Service Quality of the Government. Within the scopes of the tested data based on PPP in Indonesia, the attribute of Government Trust presents the most significant factor in the prediction models inside and outside of the machine learning implementations. This study can be further extending in the future by considering more PPP attributes as suggested in [16] that can improve the accuracy of the machine learning models and use different machine learning algorithms than the proposed study in this paper.