Multiobjective Optimization for the Forecasting Models on the Base of the Strictly Binary Trees

The optimization problem dealing with the development of the forecasting models on the base of strictly binary trees has been considered. The aim of paper is the comparative analysis of two optimization variants which are applied for the development of the forecasting models. Herewith the first optimization variant assumes the application of one quality indicator of the forecasting model named as the affinity indicator and the second variant realizes the application of two quality indicators of the forecasting model named as the affinity indicator and the tendencies discrepancy indicator. In both optimization variants the search of the best forecasting models is carried out by means of application of the modified clonal selection algorithm. To obtain the high variety of population of the forecasting models it is offered to consider values of the crowding-distance at the realization of the second optimization variant. The results of experimental studies confirming the use efficiency of the modified clonal selection algorithm on the base of the second optimization variant are given. Keywords—forecasting model; strictly binary tree; modified clonal selection algorithm; multiobjective optimization; affinity indicator; tendencies discrepancy indicator


INTRODUCTION
The main problem dealing with the development of the forecasting models is the problem of the right choice of the best forecasting model.The forecasting model based on the strict binary trees (SBT) and the modified clonal selection algorithm (MCSA) [1,2] is presented in the form of antibody, which is coded by a line of symbols randomly selected from the corresponding alphabets.This antibody can be transformed to the analytical dependence, which is used for forecasting of a time series (TS).Obviously, the correct selection of antibodies is very important for the effective use of the MCSA [1 -6].
The traditional approach in the short-term forecasting models choice consists in the quality estimation of the forecasting models by means of the average forecasting error rate (AFER), calculated for the training data sequence.Herewith the AFER should be minimized [1 -6].However, the use of the AFER as the unique quality indicator of the forecasting model is not always sufficient to determine the best forecasting model.Often it is required to consider the additional quality indicators of the forecasting model, such as the compliance to the seasonal tendencies of TS, the compliance to the trend of TS, lack of emissions, complexity of the forecasting model, etc. [6].It is expedient to use the additional quality indicator, which will allow estimating the general tendency of values' change of the known elements of TS (for example, the tendencies discrepancy indicator) along with the AFER [6].It is possible to increase the efficiency of the forecasting models on the base of the SBT, using the multiobjective MCSA at the solution of the problem of the medium-term forecasting.Herewith the affinity indicator based on the AFER and the tendencies discrepancy indicator can be used in the role of the objective functions.
The rest of this paper is structured as follows.Section 2 presents the main ideas of the original MCSA.Section 3 details the multiobjective optimization variant for the MCSA.Experimental results comparing two optimization variants (with the original MCSA and with the multiobjective MCSA) follow in Section 4. Finally, conclusions are drawn in Section 5.

II. THE MAIN IDEAS OF THE MODIFIED CLONAL SELECTION ALGORITHM
The MCSA simulates the natural laws of the immune system functioning and provides the formation of quite complex analytical functions [1], [2].The principles of developing forecasting models of k-order with the use of the MCSA were investigated in [2].The MCSA allows forming an analytical dependence on the base of the SBT at an acceptable time expenses, that describes certain TS values and provides a minimum value of the affinity indicator Aff based on the AFER: where j d and j f are respectively the actual (fact) and forecasted values for the j-th element of the TS; n is the number of TS elements.
The possible variants for analytical dependences are presented in the form of antibodies Ab , which recognize antigens g A (the TS values).The antibody Ab is selected as "the best one".It provides the minimum value of the affinity indicator Aff .The antibody coding is carried out by recording signs in a line.These signs are selected from three alphabets: the alphabet of arithmetic operations ( ' defines a constant.The use of these alphabets provides a correct conversion of randomly generated antibodies into the analytical dependence.The structure of these antibodies can be described with the help of the SBT.The number of signs in the alphabet of terminals Terminal in the antibody Ab determines the maximal possible order K of the models with k K  , where k is the real model order, i.e. having the value of the element j d in the forecasting TS at the j-th moment of time, K values of the TS elements can be used as: The use of the SBT type, illustrated in Fig. 1, allows building the complex analytical dependence and provides high accuracy of the forecasting TS [1], [2].This SBT can be created as the composition of one "left" subtree of the maximum possible order 3  K and some "right" subtrees of the maximum possible order 2  K .The term "left" ("right") subtree is used for the branch (left or right) of the SBT level where a new subtree should be included.It is rational to form antibodies by subdividing SBT into subtrees, then execute the subtree-walk of each vertex forming the ordered symbol lists on its vertices and then combining these lists consistently.Forming the symbol ordered list on the base of a subtree the consecutive double subtree-walk is carried out: at first moving the subtree bottom-up left to right it is necessary to bypass the vertices containing the alphabetic terminal signs Terminal in pairs and correspondingly above placed vertices containing the alphabetic functional symbols Functional and then moving in the same direction it is necessary to go around in pairs the vertices containing the alphabetic arithmetic operation signs Operation and correspondingly above placed vertices containing the alphabetic functional signs Functional .The first two signs of such antibody contain the pair of zero level SBT from the functional alphabet Functional and arithmetic operation alphabet Operation .Then there are the lists of the signs corresponding to the "right" maximum possible ordered subtrees 2  K (moving the SBT bottom-up) and finally the symbol list of the «left» maximum possible ordered subtree 3  K .For example, the antibody formed on the base of the SBT as shown in Figure 1 is coded by the line of signs: , which can be transformed into the analytical dependence for the forecasting model with 4  k : ) Interpreting the antibodies into the analytical dependences it is rational to use the recursive procedure of interpretation [2].The MCSA applied to the searching for "the best" antibody defining "the best" analytic dependence includes the preparatory part (realizes the formation of the initial antibody population) and iterative part (presupposes the ascending antibodies ordering of affinity Aff the selection and cloning the part of "the best" antibodies, that are characterized by the least affine value Aff the hypermutation of the antibodies clones; self-destruction of the antibodies clones "similar" to the other clones and antibodies of the current population; calculating the affinity of the antibodies clones and forming the new antibodies population; suppression of the population received; generation of the new antibodies and adding them to the current population until the ingoing size; the conditional test of the MCSA completion).

III. MULTIOBJECTIVE OPTIMIZATION
The average forecasting error rate AFER, which is also called the affinity indicator Aff (in the context of working with the MSCA) can be used as the first quality indicator for the forecasting models.The rate of discrepancy between the tendencies of two time series (the tendencies discrepancy indicator Tendency ) can be used as the second quality indicator for the forecasting models [6].
The tendencies discrepancy indicator Tendency can be calculated as: where h is the number of negative multiplications ) ( ) ( j f are respectively the actual (fact) and forecasted values for the j -th element of TS; n is the number of TS elements; r is the This indicator allows adapting the forecasting models on the base of the SBT and MCSA for the medium-term forecasting.www.ijacsa.thesai.org The affinity indicator (1) and the tendencies discrepancy indicator (2) must be used simultaneously at the quality assessment of the forecasting models on the base of the SBT and MCSA to solve the problem of the medium-term forecasting.
Various well proved approaches can be applied to the solution of the problem of the simultaneous accounting of two quality indicators for the development of the forecasting models.Herewith it is necessary especially to allocate approach, based on the several multiobjective optimization algorithms, including, evolutionary algorithms.In recent years a number of multiobjective evolutionary algorithms (MOEA) have been suggested [7] - [15].The main reason for this is their ability to find the multiple Pareto-optimal solutions in one single simulation run.These algorithms work with a population of solutions.Therefore, the primary attention has to be paid to maintaining the diversity and spread of solutions.Such MOEAs provide a solution of the account's problem of the several objective functions (quality indicators) at the analisys of various applied problems.The multiobjective genetic algorithms (MOGA) [7] - [11] are the most known algorithms of the multiobjective optimization.It is necessary to say about the multiobjective clonal selection algorithms (MOCSA) [12] - [15].However, these algorithms are less designed and, in the majority, borrow the principles of multiobjective optimization underlain in the genetic algorithms.The possibility of this loan can be explained with many similar mechanisms of the evolutionary process realization in the MOGA and MOCSA.The analisys of merits and demerits of the MOEAs shows that such the MOGAs as the NSGA-II and the NSGA-III are significantly better than othes because they can successfully solve more difficult problems of the multiobiective optimization [6].
In this regard the decision on expediency of the adaptation of the ideas put in the NSGA-II at the realization of the multiobjective MCSA which is applied for the selection of the forecasting models on the base of the SBT had been made.Herewith, it is necessary to understand the forecasting model (and the antibody corresponding to it) as the decision, and the quality indicator of the forecasting model as the objective function at the realization of the multiobjective optimization algorithm.All forecasting models with use of the notion "Pareto-dominance" can be divided to dominated and nondominated models [6].
), and also there is at least one the * v -th ( ) indicator for which the condition  ) can be calculated using the following algorithm [10,11].
Step 1.To calculate ranks for all forecasting models in the population.To unite the models with identical values of the rank into one group.
Step 2. For every group of the forecasting models:  to sort the forecasting models according to each quality indicator value in ascending order of magnitude;  to assign the infinite distance to boundary values of the forecasting models in the group, i.e.    to calculate the the crowding distance s  as: Fig. 2 shows how we can calculate the crowding distance on the base of two quality indicators.The points, marked with solid circles, correspond to the models with the minimum (zero) value of the rank (i.e.these points correspond to the Pareto front with the zero rank).To calculate the crowing distance for the s -th forecasting model it is required to define values of both quality indicators for the ( 1  s )-th and the ( 1  s )-th models, which are the nearest "neighbors" for the sth model and have the same rank.Also, it is necessary to define the best and worst values of each quality indicator.www.ijacsa.thesai.org At the realization of the multiobjective MCSA the s -th forecasting model will be better than the z -th forecasting model, if: If the s -th forecasting model is better than the z -th forecasting model, the s -th forecasting model is the candidate for transfer into the new generation.
For confirmation of prospects of the offered transformation of the MCSA it is offered to realize the following multiobjective optimization algorithm [6].
Step 1.To generate initial population of antibodies.Each antibody is coded on the base of the SBT and represents some forecasting model.
Step 2. To perform the nondominated sorting to population of antibodies on the base of two indicators of quality for the forecasting model (the affinity indicator (1) and the tendencies discrepancy indicator (2)).
Step 3. To choose the parents-antibodies for the next generation of the clones-antibodies based on the values of the rank and crowding distance.
Step 4. To pass to step 5 if desirable values of the quality indicators are reached or the quantity of generations in the MCSA is settled.Otherwise to pass to step 2.
Step 5. To accept the antibody with the minimum value of the affinity indicator (1) in the last population as the optimum decision.To use the forecasting model corresponding to this antibody for the forecasting.
As a result of application of the offered multiobjective clonal selection algorithm the Pareto set of the nondominated forecasting models will be received.These models provide the best combinations of values of the used quality indicators for the forecasting models.

IV. EXPERIMENTAL STUDIES
Both variants of optimization have been applied for the development of the forecasting models intended for forecasting of the names' references' quantity of the E-Commerce systems in the requirements to vacancies posted on the websites of 2 famous recruiting network services -HeadHunter.ru(Russia) and Indeed.com(USA).The obtained forecasting results can be used for the analysis of tendencies of the labour market.Each of the analyzed time series contains information on the number of vacancies which include a specific keyword (Magento, OpenCart, PrestoShop, Hybris, Demandware).This keyword defines the name of E-Commerce system for development of online stores.Herewith  The first 59 values and the last 5 values of elements of each TS were used as the training data sequence and the test data sequence correspondingly.The forecasting models had been developed for each TS with the use of the MCSA on the base of two variants of optimization (Table 1).The forecasting results with use of these models received for the training and test sequences of data are shown in Fig. 3 and 4. The averaged values of the relative forecasting errors at the 5 steps, the averaged values of the affinity indicator and the averaged values of the tendencies discrepancy indicator received by the results of 10 runs of MCSA for each TS are presented in Table 2.The averaged values of the relative forecasting errors at the 5 steps and the averaged values of the affinity indicator in the context of all TSs are presented graphically in Fig. 5, a.It is clear, that the second optimization variant is more effective as for the solution of problems of short-term forecasting (for 1 -3 step forward), as for the solution of problems of medium-term forecasting (for 4 and 5 steps forward).Herewith the second optimization variant allows not only receiving the smaller value of the tendencies discrepancy indicator Tendency in comparison with the first optimization variant (Table 3 and Fig. 6), but also in many cases reducing the value of the affinity indicator Aff (Fig. 5, b Thus, the expediency of researches on further improvement of the multiobjective optimization algorithms for the purpose of their application to the search problem of the adequate forecasting model of TS is obvious.

Fig. 1 .
Fig.1.An example of a strict binary tree, which is used to form antibodies

Q
be a value of the v -th quality indicator for the s - th forecasting model ( V v 1,  ; S s 1,  ); let V be a quantity of the quality indicators of the forecasting model; let S be a quantity of the forecasting models.The s -th forecasting model is dominated by the z -th forecasting model ( the following conditions are satisfied: the s -th forecasting model is dominated by the z -th forecasting model, if the following conditions are satisfied for all quality indicators: values of the v -th quality indicator (

Fig. 2 .
Fig. 2. The points used for the crowding distance calculation An example of a strict binary tree, which is used to form antibodies The crowing distances s  ( S s 1,  ) for the s -th forecasting model on the base of two quality indicators can be calculated as [6]:

ForecastingFig. 3 .Fig. 4 .
Fig. 3.The Forecasting of TSs, determining the number of references of E-Commerce systems for HeadHunter.ru(Russia) Fig. 5.The comparison of the averaged values of the forecast errors at the 5 steps and the averaged values of the affinity indicator for two variants of optimization 7TSs have been considered: 4 TSs with information on vacancies in Russia and 3 TSs with information on vacancies in USA marked (in brackets after the name of a keyword) respectively as RF and USA: www.ijacsa.thesai.org

TABLE II .
THE AVERAGED VALUES OF THE FORECASTING ERRORS AT THE 5 STEPSAND THE AVERAGED VALUES OF THE QUALITY INDICATORS (AFF (AFER)