Evolutionary Design of a Carbon Dioxide Emission Prediction Model using Genetic Programming

Weather pollution is considered as one of the most important, dangerous problem that affects our life and the society security from the different sides. The global warming problem affecting the atmosphere is related to the carbon dioxide emission (CO2) from the different fossil fuels along with temperature. In this paper, this phenomenon is studied to find a solution for preventing and reducing the poison CO2 gas emerged from affecting the society and reducing the smoke pollution. The developed model consists of four input attributes: the global oil, natural gas, coal, and primary energy consumption and one output the CO2 gas. The stochastic search algorithm Genetic Programming (GP) was used as an effective and robust tool in building the forecasting model. The model data for both training and testing cases were taken from the years of 1982 to 2000 and 2003 to 2010, respectively. According to the results obtained from the different evaluation criteria, it is nearly obvious that the performance of the GP in carbon gas emission estimation was very good and efficient in solving and dealing with the climate pollution problems. Keywords—Fossil fuels; carbon emission; forecasting; genetic programming


INTRODUCTION
Weather state and condition is a very important and dangerous issue related to some views health, climate, agriculture, economics, and tourism.Estimating the future events at the proper time is a very important task used to reduce and prevents the risks and the natural disasters.Many researchers were attracted towards this type of problems due to its difficulty and challenges in considering different input variables that should be cautiouslyconsidered, studied and measured to build the accurate forecasting models.The events and processes in the world always change due to the circumstances, so these events should be defined and declared to be processed.Climate pollution related to the carbon emission is a general serious world problem.Many international environmental agencies indicated the increase in CO 2 and greenhouse gas emission worldwide [1].So protecting the civilization from the gas pollution requires a clear and a strict policy [2].Different protocols and agreements were held between numerous countries to minimize the greenhouse gas emanation, such as the Kyoto protocol and the United Nations (UN) agreement that confirmed on the continuouspercentage checking and monitoring of the CO 2 emission in the atmosphere to reduce it to the desired levels [3].
Many countries stated and started a new policy to decrease and limit the CO 2 emission.Pollution from CO 2 emission is a serious, critical and real society enemy, for example, the UK Government's declaredclear plans and aims to minimize the CO 2 emissions to 10% from the 1990 base by 2010 and in equivalent to generate 10% of the UK's electricity from renewable sources by 2010.Renewable electricity has become related and equivalent to CO 2 reduction [4].Different studies were initiated and proposed to find out the relationship between the different energy consumption and CO 2 emission [5]- [9].
In this paper, the stochastic search algorithm Genetic Programming (GP) was used as an effective and powerful tool in building and estimating the forecasted model.GP as a soft computing technique was widely used in different fields to solve some complicated problems such as forecasting in all its type weather, rain, rivers, carbon, etc. [10]- [13].GP also as a powerful tool was efficiently used in many applications [14], [15] such as economics and sales estimations [16], shift failures [17], estimating prices [18] and stock returns [19].In this study, the GP technique was applied to deal with important and dangerous phenomena that are the CO 2 gas emitted based on four related inputs the global oil, natural gas (NG), coal, and primary energy (PE) consumption.This paper is organized as follows.Section II describes the collected data.Section III introduces the genetic programming concepts.Section IV presents the different implemented evaluation criteria.Section V describes the genetic programming model.Section VI describes the experimental results.Finally, Section VII presents the conclusion and the future work.

II. COLLECTED DATA
The carbon dioxide data set was collected from [20] as shown in Table I.The data set was collected for 31 years from 1980 to 2010.The data were trained for 23 years from 1980 to 2002 and tested for eight years from 2003 to 2010.This work is an extension of the previous work published in [12] using Neural network algorithm.www.ijacsa.thesai.org

III. GENETIC PROGRAMMING CONCEPT
GP is a stochastic search algorithm works on the concept of evolutionary algorithm.This algorithm is drivenby the principles of Darwinian evolution theory and natural selection [21], [22].GP generates a mathematical model for nonlinear systems in the form of a tree consisting of roots and nodes, where the roots constitute the mathematical operations and the nodes constitute the variables.The formulated tree depth depends on the model functional complexity.An example of GP tree structure is shown in Fig. 1.GP was used to encode a computer program in form of a tree structure and evaluate its fitness with respect to the predefined task.In 1991, John Koza suggested LISP programs that deal with various data and structures for a model manipulation due to its flexibility.The GP consists of a population of size n, which is chosen randomly based on the problem.Fig. 2 shows the evolutionary process of GP.EVALUATION CRITERIA In this paper, to solve the modeling problem for the carbon gas (CO 2 ) estimation, we considered building a model structure that takes into the account the historical measurements of the carbon data during the previous years.
The GP Model was developed using a MATLAB software toolbox called GPTIPS which works as an open source GP Toolbox for MG-GP [23].GPTIPS defin number of appropriatefunctions for seeking the population of the proper model, such as examining the model behavior, post-run a model simplification function and export the model to some formats, like graphics file, LaTeX expression, symbolic math object or standalone MATLAB file [20].GP-TIPS can be distinguished by its ability to configure to evolve the multigene individuals.
A number of evaluation criteria were used to validate the developed model.These evaluation criteria are the Variance-Accounted-For (VAF), Mean Square Error (MSE), Euclidean distance (ED), Manhattan distance (MD) and Mean magnitude of relative error (MMRE) as shown in equations next.

V. GENETIC PROGRAMMING (GP) MODEL
The Developed GP model requires the defining and initialization of some important parameters at the beginning of the evolutionary process.These parameters involve the population size, selection mechanism, crossover and mutation probabilities, the maximum number of genes allowed to constitute the multi-gene and many others.The developed GP model tuning parameters are given in Table II.
The complexity of the evolved models will change according to the maximum tree depth.Restricting the tree depth helps to evolve simple model, but it may also reduce the performance of the evolved model.Thus, we need to keep a balance between the depth, the complexity, and required performance.The GP model can be shown in Fig. 3 where four inputs were applied to the model, the global oil, natural gas, coal, and primary energy consumption to estimate the output CO 2 gas.Multigene symbolic regression can be defined as a distinctive modification of GP algorithms, where each symbolic model demonstrated by a number of GP trees weighted by a linear combination [24].In Multigene GP every tree is considered as a "gene" by itself.The predicted output yˆ is constituted by adding and combining the weighted outputs that are trees/genes in the Multigene individual with the bias term.Each tree is a function of zero or more of the N input variables z 1 , . . ., zN.Mathematically, a Multigene regression model can be written as: Where, γ0 represents the bias or offset term while γ 1 , … ,γ M are the gene weights and M is the number of genes (i.e.trees) which constitute the available individual.An example of a multigene model is shown in Fig. 4 and the mathematical model can be shown in (7).

VI. EXPERIMENTAL RESULTS
In this paper, the GP model was used to estimate the carbon dioxide gas emission.In our case, four inputs data were used.The inputs are: the oil consumption (X 1 ), NG consumption (X 2 ), coal consumption (X 3 ), PE consumption (X 4 ) and the output is the CO 2 (y), where the inputs were measured in (Mote) and the output was measured in (Mt).The proposed GP model structure performance was excellent.The estimated CO 2 results for training and testing cases were very close as shown in Tables III and IV.The data were trained for 23 years from 1980 to 2002 and tested for 8 years from 2003 to 2010.Fig. 5 shows the correlation coefficient of the proposed model.In Fig. 6, we show the GP convergence model.In Fig. 7 and Fig. 8, we show the actual and the estimated CO 2 gas emission for training and testing cases.
The mathematical equation promoted for prediction using multi-gene GP can be also shown in (8).The model structure shows a strong linear relationship between the three main attributes Global Oil, Natural Gas and the Coal while the energy consumption was not a significant feature in the modeling process.
In Table V, we calculated the error values through a number of validation criteria for both training and testing cases.

VII. CONCLUSIONS AND FUTURE WORK
In this paper, we provided an evolutionary model based multigene GP to predict the carbon dioxide emission and we compared the obtained result with actual one to measure the efficiency and strength of GP algorithm in forecasting, for both training and testing cases.From the obtained results, it was shown that the developed model is quite accurate.We can clearly see the solidity and the efficiency of GP in handling and estimating the CO 2 gas.We plan to extend our research to include other paradigms of evolutionary modeling to solve various related environmental problems.

TABLE III .
ACTUAL AND ESTIMATED CO2 -TRAINING CASE