An Optimized Artiﬁcial Neural Network Model using Genetic Algorithm for Prediction of Trafﬁc Emission Concentrations

—Global warming and climate change have become universal issues recently. One of the leading sources of climate change is automobiles. Automobiles are the prime source of air pollution in urban areas globally. This has resulted in a problematic and chaotic state in the development of an automatic trafﬁc management system for capturing and monitoring vehicles’ hourly and daily passage. With the signiﬁcant advancement of sensor technology, atmospheric information such as air pollution, meteorological, and motor vehicle data can be harvested and stored in databases. However, due to the complexity and non-linear associations between air quality, meteorological, and trafﬁc variables, it is difﬁcult for the traditional statistical and mathematical models to analyze them. Recently, machine learning algorithms in the ﬁeld of trafﬁc emissions prediction have become a popular tool. Meteorological and trafﬁc variables inﬂuence the variation and the trend of the trafﬁc pollutants. In this paper, an optimized artiﬁcial neural network (OANN) was developed to enhance the existing artiﬁcial neural network (ANN) model by updating the initial weights in the network using a Genetic Algorithm (GA). The OANN model was implemented to predict the concentration of CO , NO , NO 2 , and NO x pollutants produced by motor vehicles in Kuala Lumpur, Malaysia. OANN was compared with Artiﬁcial Neural Network (ANN), Random Forest (RF), and Decision Tree (DT) models. The results show that the developed OANN model performed better than the ANN, RF, and DT models with the lowest MSE values of 0.0247 for CO , 0.0365 for NO , 0.0542 NO 2 , and 0.1128 for NO x . It can be concluded that the developed OANN model is a better choice in predicting trafﬁc emission concentrations. The developed OANN model can help environmental agencies monitor trafﬁc-related air pollution levels efﬁciently and take necessary measures to ensure the effectiveness of trafﬁc management policy. The OANN model can also help decision-makers mitigate trafﬁc emissions to protect citizens living in the neighborhood of highways.


I. INTRODUCTION
Global warming and climate change have become universal issues recently [1,2]. One of the leading sources of climate change is emissions from motor vehicles. Carbon monoxide (CO), nitrogen dioxide (N O 2 ), carbon dioxide (CO 2 ), and nitrogen monoxide (N O) are among the significant risk to human health and the environment, which can be emitted by motorized vehicles [3]. Road transport emissions exposure can increase the risk of lung cancer [4], respiratory and cardiovascular effects [5,6], pulmonary, chronic diseases [7] and mortality [8,9]. In 2015 air pollution, in general, are responsible for over 6 million deaths in the world [10,11]. There is more than 400,000 premature death in Europe [12], and around 7 million worldwide [13].
Automobiles are the prime sources of air pollution in city areas universally. For example, 80.49% of emissions in Beijing, China, were produced by motor vehicles [14]. It was also found that 80% of air pollution in the Lima Metropolitan Area was produced from automobiles [13]. in China, 85% of emissions were from transport [15], while in the United Kingdom was 92% [16], and 75% in Malaysia [17], but in the United States, automobiles are responsible for emitting 57% of air pollution [7]. Many other studies also show that 92% of CO and 65% of hydrocarbon (HC) pollutants were emitted from the transportation activities in Shanghai [18], and 60% of N O x and P M emissions were from heavy-duty trucks in China [19].
The variation and trend of air pollution strongly depend on meteorological parameters and traffic characteristics [20]. A study conducted in Shiraz, Iran, confirmed that meteorological conditions increase the air pollution level [21]; a similar result was also found in Karaj, Iran [22]. A relationship between air pollution and meteorological condition was studied in Beijing and Nanjing, China. The study reveals that air pollution concentrations depend on meteorological factors [23]. Research in Linfen city, China, shows negative correlations between air pollutants and meteorological parameters [24]. A study was conducted in Penang, Malaysia, to investigate the sources contributing to air pollution concentrations. Five air pollutants were investigated. The result shows that a negative correlation was found between relative humidity with CO, O 3 , SO 2 , P M 10 , and N O 2 . studied air pollution variation due to meteorological in four areas in Malaysia, namely, Petaling Jaya, Cheras, Shah Alam, and Klang. The result reveals that meteorological parameters influenced the seasonal trend of air pollution. Association between air pollution and traffic characteristics has also been investigated. A study was conducted to compare the impact of traffic volume on air pollution levels during COVID-19 in Italy. Data from 2017 to 2018 before COVID-19 and 2020 data during COVID-19 were used. The result reveals that traffic volume significantly impacts P M 10 , N O x , N O, and N O 2 concentrations [28].
A study of [29] investigates the exposure of air pollution produced by vehicles on cyclists in Brazil. The research indicates an increase in motor vehicles during peak hours in the morning and evening. The expansion of the vehicle increases the level of air pollution. In addition, air pollution level due to traffic characteristics was studied in Japan [20]. The result indicates that the low speed of the vehicle increases the pollution level. Similarly, traffic volume and congestion increase the emission of air pollution in Kyoto, Japan. It was also found that trucks are the main contributor of P M and N O x emissions. Furthermore, a study in Kuala Lumpur was conducted to investigate the effect of traffic characteristics on the air pollution level. The study reveals that air pollution level strongly depends on fuel consumption, traffic volume, vehicle speed, and waiting time on the road. The result also shows that lower traffic congestion reduces the level of air pollution in Kuala Lumpur [30].
With the rapid advancement of sensor technology, atmospheric information such as air pollution, meteorology, and motor vehicle data can be collected and stored in databases. Due to the complexity and non-linear associations that exist between air quality, meteorological, and traffic variables, it is difficult for traditional statistical and mathematical models to analyze them [31,32]. Lately, the usage of machine learning algorithms such as long short term memory, random forest, support vector machine, decision tree, and artificial neural network (ANN) in traffic-related air pollution prediction has become popular [33]. ANN model appeared to be the most used model for predicting traffic emissions because it reduces time, cost, and complexity. It also provides fast and accurate prediction with less error and provides prediction values closer to the observed values [34]. ANN can solve complex multidimensional variables and non-linear problems related to traffic emission concentrations due to meteorological conditions and traffic features [34,35].
In this paper, an optimized artificial neural network (OANN) was developed to enhance the existing artificial neural network (ANN) model by updating the weights in the network using a Genetic Algorithm (GA). The OANN model was implemented to predict the concentration of CO, N O, N O 2 , and N O x pollutants produced by motor vehicles in Kuala Lumpur, Malaysia. The remaining structure of the paper is given as follows. Section II discusses the related work on vehicle emissions prediction using the ANN model. Section III presents the methodology used in this study. Section IV presents the result and the comparison with existing machine learning models for evaluation. Finally, the conclusion is discussed in Section V.

II. RELATED WORK
Motor vehicles are producing harmful pollutants that disperse to the atmosphere [33]. These pollutants have significant impact on human health [36]. Several statistical models have been developed for predicting the traffic emissions concentrations at intersection, canyon, street, near the school, and many other locations. However, these statistical models could not predict the emissions rate due to the variability and influence of meteorological variables and traffic parameters [31]. Machine learning models have recently been applied. These models were able to predict the concentrations of emitted pollutants from motor vehicles. ANN has become the most popular model for predicting traffic-related air pollution [34]. These models are highly dependent on the independent variables provided in the study. There is a lack of either meteorological data or traffic data in many studies [37]. Several studies suggested that meteorological and traffic variable influence the trend and variation of traffic pollutants [33], such as relative humidity, temperature, wind direction, and wind speed, [38], traffic volume, vehicle speed [39], types of the vehicle, etc. [40].
The variables mentioned above are needed to predict emission levels. Still, it is not always available [37], for example, prediction of carbon monoxide (CO) concentrations at Jiyin Ave and Shuanglong intersections was conducted by [41] using Gated Recurrent Unit (GRU) neural networks based in the absence of a meteorological dataset. The accuracy of the model's performance was found good with a root mean squared error (RMSE) of 0.088 and mean absolute error (MAE) of 0.056. [42] proposed machine learning algorithms by comparing and selecting the best model with good performance to reduce the effect of Greenhouse gas (GHG) emitted by passenger vehicles on climate change in Canada. Artificial Neural Network (ANN) shows better performance over the other machine learning algorithms with RMSE of 0.442 and MAE of 0.347. This study also lacks the meteorological dataset. ANN was developed by [43] to predict the emission of CO, CO 2 , N O x , and HC from a liquefied natural gas bus in Zhenjiang, China, without considering meteorological features. The performance of the prediction of CO 2 was unsatisfactory. The MSE value is 52, but the remaining predictions of the contaminants were good. The MSE value for CO 2.23, HC 0.68, and N O x 9.4. Carbon monoxide was predicted using a Non-linear Autoregressive Exogenous (NARX) based neural network in the absence of traffic data at Shiraz, Iran [44]. The proposed model performed well compared to the previous models with RMSE values of 0.43, R2 0.31, and MAPE 51.
There are some studies used three datasets, namely, air quality, meteorological, and traffic datasets. ANN was applied to predict the level of N O, N O 2 , O 3 , N O x , CO 2 , P N 10 , N H 3 , P M 10 , P M 2.5 , and P M 1 pollutants from on-road vehicles at the street canyon in Germany. The model have the lowest RMSE for some pollutants, while others have the highest RMSE, which shows that the model has to be improved for predicting these pollutants. The RMSE for N O, N O 2 , O 3 , N O x , CO 2 , P N 10 , N H 3 , P M 10 , P M 2.5 , and P M 1 were 16.017, 5.092, 5.774, 32.820, 0.790, 12,872.74, 13.474, 0.050, 0.013, and 0.010 [45]. [46] proposed an ANN model to predict CO concentration at Subang Jaya Toll plaza, Selangor, Malaysia. Traffic and meteorological variables were used as an input to the model.  [47] to predict the CO, CO 2 , N O x , and HC levels. The ANN performed well with RMSE 0.0930 for CO, 0.080 for CO 2 , 0.0856 for N O x , and 0.0798 for HC. Furthermore, [48] predict NO, CO, and HC concentrations using the ANN model. The model's performance was found good with RMSE 1.89, 0.97, 1.09 for N O, CO, and CO 2 . Table I presents the variables and size of the dataset used by previous studies. Table II summarizes the performance of the models used in the previous researches.

A. Data and Location
In this paper, air quality, meteorological, and traffic datasets were used. The traffic data was obtained from the Ministry of Works, Malaysia, while the air pollution and meteorological datasets are collected from the Department of Environment (DOE), Malaysia. These datasets are set of observations recorded at a specific time for sixteen hours daily for three years (2014-2016). The CO, N O, N O 2 , and N O x features from air quality and meteorological features such as relative humidity, wind speed, and temperature were used [36]. The traffic dataset consists of traffic volume, the volume of the type of vehicle (taxi and car, bus, van, heavy and light lorries, and motorcycle), time spent on the road, and speed of the vehicle were used [49]. The traffic dataset was collected from the Ministry of Works Malaysia at Jalan Kepong traffic census station located in Kuala Lumpur, Malaysia.

B. Design and Development of OANN Model
An optimized artificial neural network (OANN) was designed and developed to enhance the existing artificial neural network (ANN) model by updating the initial weights in the network using a Genetic Algorithm (GA). An artificial Neural Network (ANN) is an information processing system develop to imitate the human brain's learning and decision-making from experience and examples [50,51]. The structure of the ANN model consists of input layer, hidden layers, and output layer(s). These layers consist of neurons. These neuron's connection is associated with weight and bias. The neurons have an activation function that determines the neuron's output [52]. The structure of the ANN equation is given below [51]: Where y is the output of the network, n is the number of neurons in hidden layers, w i is the weights of the respective neurons, v i is the input values of the neurons, f is the activation function, and b is the bias. The learning process of ANN improves the model's performance during training by updating the weights in the network. The weights of neurons in ANN define how much influence the input has on the output. The initial weights are randomly chosen [53]. The optimization method has been used to improve and update the weights to efficiently improve the model's accuracy.
Optimization is a process of making something better or finding the best solution, or making a good decision. In this study, a Genetic Algorithm (GA) was designed and developed to optimize the initial weights of ANN to improve its performance in this study. GA works as a process of making changes or finding optimal solutions for the problems. GA works on a (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 population with a chromosome, and each chromosome has a number of values known as genes [54,55,56].
Population in GA represents the entire network's weights (that is, the weights of the input layer to the hidden layer and the weights of the hidden layers to the output layer). Chromosomes represent the weights in one layer (for example, the weights of the input layer to the hidden layer). Finally, Gene represents each neuron's weight (for example, if we have five neurons in the input layer). Fig. 1 illustrates the representations of the GA in the ANN model. We will create three vectors. The first vector holds the number of chromosomes per population, the second vector holds the size of the population, and the last vector holds the initial population. There are few processes to generate a new and better population. The processes are given below: 1) Select the best parent based on their fitness function.
2) The selected parents will be used to produce offspring.
3) The production of offspring was performed using crossover and mutation. 4) Generated offspring (we create a vector to hold the generated offspring (new population)). 5) The algorithm generates new populations (based on how many generations needed). 6) One of the best population will be selected. 7) The algorithm stops when the optimal solution is found.
Ring pattern was used for selecting parent; for example, if we have ten chromosomes and each chromosome has six genes, the newly generated population will have the same number of chromosomes and genes. If seven chromosomes were found to have the highest fitness value, they would be selected. The remaining three chromosomes will be produced using the ring style. The algorithm will combine the seventh and the first chromosome to produce the eighth one. The first and second chromosomes will be combined to produce the ninth one; we apply a similar way until we get similar chromosomes in the new generation. An example of initial weight (population) is given in Fig. 2. The selection of parents was done by calculating the fitness value or function. This was performed using the equation below [57,58]: Where f is the fitness and c is the chromosome values (genes/weights). An example of a population given in Fig. 2 was calculated in Table III. Based on the table, the first, third, and last chromosomes have the highest fitness values. These parents will be selected to produce new offspring. As we have discussed earlier, producing offspring was done through crossover and mutation. The crossover is the process of combining two selected parents to produce two offspring [56,59]. The first half of the first parent and last half of the second parent are selected as the first half of the offspring, while the last half of the first parent and the first half of the second parent were chosen as the second half of the offspring. Fig. 3 shows the producing offspring using the crossover. The mutation was achieved by changing the gene's (weight) value from the new offspring [57,60]; the altered gene was called a mutant. The gene alteration was done randomly by changing the gene with a lower or higher number than the value of the previous gene range between -1 to 1. An example of mutation is presented in Fig. 4. After the new population was generated, the algorithm produces new generations (based on how many generations are needed). One of the best generations (optimized weights) will be selected and used as the weights of the ANN model. The flow chart of the Genetic Algorithm (GA) for selecting or producing a new solution has been presented in Fig. 5.

C. Implementation of the OANN Model
The developed OANN predictive model was implemented to predict the concentrations of pollutants produced by motor vehicles. The estimated hourly traffic volume, generated vehicle speed and time spent on the road, standardized pollutants values, meteorological and air quality variables were used to train and to test the developed predictive model for traffic emissions concentrations. The developed OANN model consists of four layers: one input layer, two hidden layers, and one output layer. The input layer consists of 12 inputs: traffic volume, taxi and car volume, bus volume, van volume, heavy lorries volume, light lorries volume, motorcycle volume, time spent on the road, vehicle speed, and relative humidity, wind speed, and temperature. The first hidden layer has ten neurons. The second hidden layer has 100 neurons. The output layer has only one output. The output is the predicted values of the pollutants. For OANN prediction model was created for all the four pollutants namely, CO, N O, N O 2 , and N O x . The linear activation function was used from the output layer. The ReLU activation functions were used for the hidden layers.
The weights of the model were optimized using a Genetic Algorithm. Fifty populations were generated, and one of the best populations was selected. The structure of the OANN model has been presented in Fig. 6 and applying the GA in the model was presented in the Fig. 7.  The MEA calculates and finds the differences between the www.ijacsa.thesai.org measure or predicted value and the observed value. Equation 3 used in this study is given below [61]. MSE is the difference between the actual and predicted values. MSE was calculated using equation four [62].
The n is the number of prediction values, represent the summation (adding all the values),x i is the predicted values, x i represents the actual values, and |x i − x i | is the absolute errors.
An Optimized Artificial Neural Network (OANN) was developed to predict the concentration of traffic emission in Jalan Kepong area, Kuala Lumpur. The OANN was enhanced by updating the initial weights of the ANN model using GA to improve the accuracy of the existing ANN model. There is a total of 1120 initial weights generated by the model. These weights were from the input layer neurons, first hidden layer neurons, and second hidden layer neurons. There are 12 neurons in the input layer, and they are connected to the first hidden layer. There are ten neurons in the first hidden layer, and they are connected to the second hidden layer. Finally, there are 100 neurons in the second hidden layer, and they are attached to the neuron in the output layer. The generated weights were optimized to improve the ANN model. The GA generated 50 populations/optimized weights, and one of the best populations was selected.  The best set of weights will be selected from the generated 50 population. In Fig. 8 the set of weights was selected after 33 generation for predicting the CO pollutant.
The best set of weights for predicting N O pollutant was selected after 24 generations. Best optimized weights were selected after 38 generations for N O 2 prediction. The best set of weights was selected after 45 generations for predicting the level of N O x emissions. Sample of the optimized populations/weights was illustrated in Fig. 9. The highlighted in yellow colors in Fig. 9 were the optimized weights from the input layer to the first hidden layer, which consist of 12 neurons connected to the first hidden layer with ten neurons. The first hidden layer consists of 10 neurons connected to the second hidden layer with 100 neurons. The second or last hidden layer has 100 neurons connected to the output layer with one neuron. The population with the optimized weights were chosen for prediction of the CO, N O, N O 2 , and N O x . The set of the optimized weights for the prediction of the CO pollutant was presented in Fig. 10.

V. CONCLUSION
A based Artificial Neural Network (OANN) model was developed to enhance the existing ANN model by updating the initial weights that connect the neurons in the network using a Genetic Algorithm (GA). The OANN model was used to predict the level of pollutants emitted by vehicles in Kuala Lumpur, Malaysia. The OANN model was evaluated using performance metrics and comparison with ANN, DT, and RF models. The result shows the developed OANN model performed better than the existing ANN, DT, and RF models with the lowest regression metrics when compared.
Based on the literature review, the study of [45] predicts the concentration of air pollutants at canyon street. Still, our study predicts the level of air pollutants at the roadside, which is an open space. [46] study was the closest study with our study; the difference is, the study focuses more on the truck vehicles and also used a one-month dataset. Additionally, they predict the concentrations of CO pollutants. Our study added three more pollutants namely, N O, N O 2 , and N O x . The study of [48] used different variables such as length of the vehicle, vehicle registration, and opacity. The study also has a higher RMSE score than the RMSE values of our study. The developed OANN model can help environmental agencies monitor traffic-related air pollution levels efficiently and take necessary actions to ensure the effectiveness of traffic management policy. Moreover, the model can help decisionmakers mitigate traffic emissions to protect the health of the citizens who are inhabiting very close to highways.
Admittedly, there are some limitations to this study. Firstly, Some studies suggested that different types of fuel have different types of emissions, but the fuel type was not considered in this study. The Jalan Kepong traffic census station was selected, but the other stations were not considered. There are many pollutants emitted by motor vehicles. However, the developed OANN model was limited to predict the hourly CO, N O, N O 2 , and N O x concentrations. It can be applied to the other traffic pollutants as well. Furthermore, daily, weekly, monthly, or yearly predictions were not the focus of this research. The prediction using only one type of vehicle was not considered as well.