Convolutional Neural Network Hyper-Parameters Optimization based on Genetic Algorithms

In machine learning for computer vision based applications, Convolutional Neural Network (CNN) is the most widely used technique for image classification. Despite these deep neural networks efficiency, choosing their optimal architecture for a given task remains an open problem. In fact, CNNs performance depends on many hyper-parameters namely CNN depth, convolutional layer number, filters number and their respective sizes. Many CNN structures have been manually designed by researchers and then evaluated to verify their efficiency. In this paper, our contribution is to propose an innovative approach, labeled Enhanced Elite CNN Model Propagation (Enhanced E-CNN-MP), to automatically learn the optimal structure of a CNN. To traverse the large search space of candidate solutions our approach is based on Genetic Algorithms (GA). These meta-heuristic algorithms are wellknown for non-deterministic problem resolution. Simulations demonstrate the ability of the designed approach to compute optimal CNN hyper-parameters in a given classification task. Classification accuracy of the designed CNN based on Enhanced E-CNN-MP method, exceed that of public CNN even with the use of the Transfer Learning technique. Our contribution advances the current state by offering to scientists, regardless of their field of research, the ability of designing optimal CNNs for any particular classification problem. Keywords—Machine learning; computer vision; image classification; convolutional neural network; CNN hyper parameters; enhanced E-CNN-MP; genetic algorithms; learning accuracy


I. INTRODUCTION
Image classification is an important task in computer vision involving a large area of applications such as object detection, localization and image segmentation [1][2][3]. The most adopted methods for image classification are based on deep neural network and especially Convolutional Neural Networks (CNN). These deep networks have demonstrated impressive and sometimes human-competitive results [4,5]. CNN deep architecture can be divided in two main parts [6]. The first part, based on convolutional layers CNN, offers the ability of features extraction and input image encoding. Whereas, the second one is a fully connected neural network classifier which role is to generate a prediction model for the classification task. A CNN model is described by many hyper-parameters specifically convolutional layers number, filters number and their respective sizes, etc.
Many researchers proposed different CNN models such as AlexNet, Znet, etc. To improve the network accuracy some of them choose to increase the depth of the network [7]. Others propose new internal configurations [8]. Although, these stateof-the-art CNNs have been shown to be efficient, many of them were manually designed.
During our research, we note that a miss configured values of CNN hyper-parameters namely the network depth, the number of filters and their respective sizes dramatically affect the performance of the classifier. In addition, manually, enumerating all the use cases and selecting optimal values for these hyper-parameters is almost impossible even with a fixed number of convolutional layers. Through contributions held in this paper we propose an innovative approach, labeled Enhanced Elite CNN Model propagation (Enhanced E-CNN MP), to automatically learn optimal CNN hyper-parameters values leading to a best CNN structure for a particular classification problem. Our approach is based on Genetic Algorithms (GA) known to be meta heuristic methods for nondeterministic problem resolution. Each CNN candidate solution structure, is encoded as an individual (chromosome). To search for the best fit individual, the proposed method is based on "The elite propagation" through the whole GA process.
The designed Enhanced E-CNN MP approach is an innovative approach. Our contribution will allow scientists to design their own CNN based prediction model suitable for their particular image classification problem. This paper is organized as follows. Section II provides an overview of Convolutional Neural Network. In section III, the Genetic Algorithms paradigm is exposed. Problem statement is presented through section IV. Section V introduces related work. Section VI illustrates the designed Elite CNN Model Propagation (E-CNN-MP) approach based on GAs for CNN hyper parameters optimization. E-CNN-MP simulations and results are presented in section VII. Through section VIII, an Enhanced E-CNN-MP version is proposed. The last section includes our concluding remarks.

II. DEEP LEARNING BASED ON CONVOLUTIONAL NEURAL NETWORK
A neural network is a mathematical model with a design inspired from biological neurons. This network architecture is divided in layers. Each layer is a set of neurons. The first layer of a neural network is the input layer into which we inject the As shown in Fig.1, in a neural network, a single neuron has several inputs. Each input connection is characterized by a weight . On the activation of the artificial neuron, it computes its state , by summing all the inputs multiplied by their corresponding connection weights. To ensure that the neuron will be activated even when all entries are none, an extra input, called bias , is added. This extra input is always equal to 1 and has its own weight connection.
To normalize its result (normally between 0-1), the neuron passes it through its activation function [9]. CNNs are category of deep neural networks used especially in computer vision area such as image classification [10,11] There are four main operations which are the basic building blocks of every CNN [13,14].

A. Convolution
Convolutional layer derives its name from the convolution operator. The aim of this layer is image features extraction. Convolution conserves the spatial relationship between pixels by learning image features using small squares of input data. Each convolution layer uses various filters to features detection and extraction such as Edge Detection, Sharpen, Blur, etc. These filters are also called "kernels" or "feature detectors". After sliding the filter over the image we get a matrix known as the feature map [15].
In the first convolution layer, the convolution is between the input image and its filters. Filter values are the neuron weights (see (1)).
In deep layers of the network, the resulting image of convolutions is the sum of convolutions, with the number of outputs of layer (see (2)).

∑  
With the value of the neuron of the layer , the input vector of the neuron , the weight vector and the bias.
In practice, a CNN learns the values of these filters on its own during the training process. Parameters such as number of filters, filter size and network architecture are specified by the scientific before launching the training process. The more number of filters we have, the more image features get extracted and the better the network becomes at features extraction and image classification.
The size of the feature map is controlled by three parameters which are:  Zero-padding: To apply the filter to bordering elements of input image, it is convenient to pad the input matrix with zeros around the border.

B. Activation Function
Once the convolution has been completed, an activation function is applied to all values in the filtered image to extract nonlinear features. There are many activation functions such as the ReLU [16,17] which is defined as , the function [18] or the sigmoid function [19].
The output value of a neuron of layer depends on its activation function and is defined as (see (3)) :

  
with the activation function and the value of neuron of layer . The choice of the activation function may depend of the problem. The ReLU function replaces all negative pixel values in the feature map by zero. The purpose of ReLU is to introduce non-linearity in the CNN, since the convolution operator is linear operation and most of the CNN input data would be non-linear. Result of convolution and ReLU operation is called rectified feature map.

C. Pooling
The pooling step is also called subsampling or down sampling. It aims to reduce the dimensionality of each rectified feature map and retains the most important information.
The two most used methods to apply in this operation are the average or max pooling [20].
After this step of sub-sampling we get a feature map that is defined in (4) with the feature map of the layer , the operation pooling and the output value of the neuron of the layer The advantages of the pooling function are:  It makes the input representations (feature dimension) smaller and more manageable.
 It reduces the number of parameters and computations in the network, therefore, controlling overfitting.
 It makes the network invariant to small transformations, distortions and translations in the input image. In fact, a small distortion in input will not change the output of pooling since the maximum/average value in a local neighborhood is taken.
 It helps getting an almost scale invariant representation of input image. This is very powerful since we can detect objects in an image no matter where they are located.

D. Fully Connected Layer
The output from the convolutional and pooling layers of a CNN is the image features vector. The purpose of the fully connected layer is to use these features vector for classifying the input images into several classes based on a labeled training dataset.
The fully connected layer is composed of two parts. The first part consists of layers so-called fully connected layers where all its neurons are connected with all the neurons of the previous and next layers. The second part is based on an objective function. In fact, CNNs seek to optimize some objective function, specifically the loss function. The well-used loss function is the Softmax function [21]. It normalizes the results and produces a probability distribution between the different classes (each class will have a value in the range [0, 1]) [22]. Adding a fully-connected layer allows learning nonlinear combinations of extracted features which might be even better for the classification task.

E. Genetic Algorithms
GAs are heuristic solution-search or optimization methods. These techniques were originally inspired from the Darwinian principle of evolution through (genetic) selection.
A GA is based on a highly abstract form of evolutionary processes to give solutions to complex problems. Each GA operates on a population of artificial chromosomes. Each chromosome signifies a solution to the problem to be resolved and has a fitness. A chromosome fitness is a real number measure which represents its performance as a solution of the specific problem.
GA method begins with a randomly generated population of chromosomes. It, then carries out a process of selection and recombination based on each chromosome fitness. Parent genetic materials are recombined to generate child chromosomes producing a next generation. This process is iterated until some stopping criterion is reached. In this way, a GA evolves a best solution to a given problem.
GAs were first proposed by John McCall [23] as a method to find best solutions to problems that were otherwise computationally intractable. McCall"s theorem, and the related building block hypothesis, delivered a theoretical basis for the conception of effective GAs. The development and success of GAs have significantly contributed to their adoption in many computational approaches based on natural phenomena. GA is, henceforth, a major part of the wider field of Computational Intelligence such as Neural Networks, Ant Colony Optimization, etc.

F. Genetic Algorithm Structure
A GA is made from a number of "standard" components. This conception facilitated their re-use with trivial adaptation in many different problems. The main components of GA are: chromosome encoding, fitness function, selection, recombination and evolution scheme. www.ijacsa.thesai.org problem. A chromosome is an abstraction of a biological DNA chromosome. It can be thought of as a combination of genes. For a given problem, a particular representation is used and referred to as the GA encoding of the problem. GA proposes two ways for chromosome encoding:  A bit-string representation to encode solutions: bitstring chromosomes consist of a string of genes whose allele values are characters from the alphabet {0,1}.
 Value Encoding: chromosome, in direct value encoding, is a string of some values which can be whatever form related to problem such as numbers, real numbers, chars, some complicated objects, etc.
2) Fitness: The fitness function allows to compute and evaluates the quality of a chromosome as a solution to a particular problem. Fitness computation will go on through GA generations measuring the performance of each individual in terms of various criteria and objectives defined by researchers (completion time, resource utilization, cost minimization, etc).
3) Selection: Selection method in a GA is very important as it guides the evolution of chromosomes through generations. This method will permit to make a choice regarding the parent chromosomes to be used for child chromosome creation.
In GA process, chromosome selection for recombination is based on its fitness value. Best fit individuals should have a greater chance of selection than those with lower fitness.
Many selection methods are proposed in literature such as [24]:  Roulette Wheel (or fitness proportional) selection method which allocates each chromosome a probability of being designated proportional to its relative fitness. This value is computed as a proportion of the sum of all chromosome"s fitness in the population.
 Random Stochastic selection explicitly chooses each individual a number of times equal to its expectation of being selected under the fitness proportional method.
 Tournament selection first chooses two individuals based on a uniform probability and then selects the one with the highest value of fitness.
 Truncation selection first eliminates a fixed number of the least fit chromosomes and, then, picks one at random from the population having.

4) GA Recombination operators:
GA recombination method allows the production of offspring with combinations of genetic material from parents chosen through the selection method. This process allows to form members of a successor population based on recombination of chromosomes selected from a source population. Since the selection mechanism is biased towards chromosomes with higher fitness value, this guarantees (hopefully) the evolution to more highly fit individuals in the descendant generations.
There are two main operators for genetic recombination which are:  Crossover:

 Mutation
Those Genetic operators are nondeterministic in their behavior. Their outcome is also nondeterministic: each happens with a certain probability.
Crossover operator characterizes the fact of mixing genes from two selected parent chromosomes. This recombination allows to produce one or two child chromosomes.
Literature proposes many alternative forms of crossover method:  One-point crossover generalized to 2-and multi-point crossover operations: the idea is to choose a sequence of crossover points along the chromosome length. Child chromosomes are subsequently created by interchanging the gene values of both parents at each chosen crossover points.
 Uniform crossover creates a child chromosome by picking uniformly between parent gene values at each chosen position.
Crossover algorithms also vary with according to the number of created children through the process.
To ensure a maximum of diversity when creating offspring, all crossover resulted chromosome(s) are then passed on to the mutation process. Mutation operators perform on an individual chromosome to change one or more gene values. The aim of these genetic operators is to increase population diversity and avoid premature convergence to a less optimal solution for a particular problem.

5) Evolution:
After the crossover and mutation process, the resulting chromosomes are passed into the descendant population called next generation. This process is then iterated for all upcoming generations until reaching a stopping criteria. Termination conditions can include:  A solution with minimum criteria is found.
 Fixed number of generations elapsed.
 Due budget such as computation time/money reached.
 The highest level solution's fitness is reaching or converge to a best-fitness solution such that successive generations no longer yield better results.
 Manual inspection that fully satisfies a set of constraints.
Evolutionary schemes depend on the degree to which individuals from a source population are permitted to move on unchanged to the next generation. Evolutionary scheme is an important aspect of GA design. It depends closely on the nature of the solution space being investigated. These schemes vary from: www.ijacsa.thesai.org  Complete replacement, where all next generation chromosomes are generated through selection and mutation.
 Steady state, where the next generation is created by generating one new chromosome at each new population and using it to replace a less-fit individual of the original population.
 Replacement-with-elitism: This is a hybrid complete replacement method since the best one or two individuals from the source population are preserved in the next generation. This scheme avoids individual of the highest relative fitness from being lost through the nondeterministic selection process

G. GA Design
When solving problem is based on GA metaheuristic approach, scientific may make many choices in designing the genetic algorithm. These choices are related to:  Evolutionary scheme to be applied; Numerous examples of non-classical GAs can be found in literature [25,26]. A typical architecture adopted for a classical GA using complete replacement with standard genetic operators might be as follows: (S1) Randomly create an initial population of chromosomes (source population).
(S2) Compute the fitness value, , of each chromosome c in the initial population.
(S3) Create a successor population of chromosomes as follows: (S3a) Use selection method to select two parent chromosomes, and , from the previous population.
(S3b) Apply crossover technic to and with a crossover rate to get a child chromosome .
(S3c) Apply a mutation method to with mutation rate to produce .
(S3d) include the chromosome to the successor population.
(S4) Replace the source population with the successor population.
(S5) If not reaching stopping condition, return to Step S2.
The flexibility of this standard architecture allows its implementation and refinement by scientific to fit a particular problem to be solved based on of this metaheuristic approach.

III. PROBLEM STATEMENT
In this section we establish the context of the current work according to a previous investigated one. The aim of our research is to design an approach to be used for image classification task. For this purpose, we are interested in machine learning algorithms and specially supervised learning.
Machine learning (ML) is a wide variety of algorithms particularly suited to prediction. ML avoids starting with a data model and rather uses an algorithm to learn the relationship between the response and its predictors. As shown in Fig. 2, ML techniques try to learn the response by observing inputs and responses and finding dominant patterns.
At the end of the training process we get a predictive model which can be used to classify a new input data.
When we use supervised learning algorithms for classification purpose the input variables are a labeled data (for each input from the dataset we know its class or category) and the output variable represents a category (class). A supervised algorithm aims to learn the mapping function from the input to the output: The goal is to well approximate the mapping function () in such a way that for a new input data the algorithm can predict its category . Learning stops when the algorithm achieves an acceptable level of prediction accuracy.
In previous works, we investigated many approaches to design a machine learning framework for image classification. During research, we investigated two approaches: 1) The Bag of Features paradigm and CNN as features extraction and image encoding methods [27,28]. Our experimentation results shown in Fig.3,4 demonstrated how CNN performs better than BoF as features extractor and image encoding technique [30].
Deep Learning approach based on Transfer Learning technique. The pre-trained used CNN is AlexNet [29]. Its architecture is described in Fig.5 [6]. Based on this approach, we reached a classification accuracy of 93.33% [31]. www.ijacsa.thesai.org    In the current work, our overarching approach is to design our own CNN model for stop sign image classification. The designed CNN will be trained from scratch. For the CNN to be created, many hyper parameters must be defined, such as: CNN depth, filters number per convolutional layer and their respective sizes. To build our network, we, first, choose to follow the same model as the AlexNet one. The CNN model includes 5 convolutional layers each layer is followed by a ReLU and MaxPooling layer. The input image size is 227x227x3. The feature vector resulting from the convolution part of the model has a dimension of 4096, which is a good dimension to encode image for classification task. The fully connected part of the model is composed of three fully connected layers. The first two layers have 4096 neurons. The activation function used is the ReLU. The third layer is a softmax that calculates the probability distribution of the five classes. In the first approach simulation, we put a large number of filters on the convolution layers. This number of filters and their sizes are chosen randomly. For this purpose, we refer to already developed CNNs such as AlexNet.
The CNN designed architecture is represented in Table I.  The accuracy of the manually designed model is too bad and not exceed a 30.8%. The main issue of this model is that the value of its hyper parameters are not optimum for our stop sign image classification problem.
In this stage of research, we have to solve a problem with N variables corresponding to the CNN hyper parameters while ensuring a good learning accuracy. In order to optimize the values of these variables we design a solution based on a genetic algorithm approach well known for nondeterministic problem resolution.

IV. RELATED WORK
For decades, neural networks have proved their ability in machine learning. To increase network performance, some researches are based on deeper networks [35,36] while others propose adding highway information [37,38].
One of the most challenging aspects of deep networks is how to configure them and search for their hyperparameter values. To address this problem, some proposed methods include the use of stochastic depth [39,40] or dense convolutional networks [41]. However, the limit of these approaches is that all proposed deep network structures are deterministic which limits the flexibility of the models and consequently motivates us to design an automated search for optimal CNN hyperparameters for a given classification task.
In fact, searching for optimal deep network hyperparameters can be led through different strategies:  Heuristic search such as genetic algorithm or Bayesian optimization.
Reference [42] shows that a simple random search gives better results than grid search, particularly for highdimensional problems with low intrinsic dimensionality. In [43] and [44], proposed methods are based on Bayesian optimization process and yield better performance. In this paper we investigate a heuristic search. Our strategy is based on genetic algorithm well known for non-deterministic problem resolution. Our work aims to design and experiment a competitor new GA encoding method for CNN structure search.

V. PROPOSED APPROACH: ELITE CNN MODEL PROPAGATION (E-CNN-MP)
To design an optimal CNN model for our classification problem, we design a GA based approach labeled "Elite CNN Model Propagation" (E-CNN-MP). The main structure of a GA method is adopted. We, then, develop a specific method for chromosome encoding, chromosome recombination and fitness function.
In this section, we describe the E-CNN-MP modules that allow to evolve optimum hyper-parameters of a CNN for sign stop image classification problem. In the proposed framework, each chromosome is a candidate solution representing a CNN architecture. The training process error is chosen as the fitness function of a chromosome. In this case the GA based solution aims to compute the optimal hyper-parameters value giving the less error and consequently the higher classification accuracy.

A. Chromosome Encoding
In the designed approach, each chromosome represents a solution to the problem. For a CNN with D p convolutional layers (CNN depth), the genetic algorithm inputs are 2*D p variables to be optimized through the GA process. These variables are D p pairs of values (Filter Number per Layer FNL, Filter Size per Layer FSL). The value encoding scheme of a chromosome (individual in a population) is illustrated in Fig.7.

B. Population Initialization
GA population is a set of C individuals . Each individual is represented by a vector with length (GA variables number).
To initialize the first population , we use an uniform randomized values in the defined intervals:

With: {
These values will be modified through mutation and crossover when discovering competitive structures during the genetic process. Each initialized individual will be evaluated. For this purpose, we compute its fitness score which is the classification error of the corresponding CNN. Our approach aims to search for the optimum individual which minimizes the fitness function and subsequently the classification error. Computing individual fitness is realized through a whole CNN training and evaluation process which requires heavy computation. For all simulations we use a single GPU. To evaluate the designed approach, generations of eight individuals are used. This number can be generalized and scaled up if we dispose more resources.

C. Selection Method
At the beginning of every generation creation, we apply a selection method. A successor generation of a source generation is defined as follows: 1) A fraction of elite individuals from propagated to . These individuals are the fit individuals with lower fitness function value (CNN classification error) over the whole generation .
2) A fraction of , other than elite children, that are created by crossover.
3) The remaining individuals to form the new generation are chosen randomly. This ensures the population diversity and avoids that the genetic algorithm converges rapidly.
Our selection approach aims to eliminate the least fit individuals from each generation.
To select parents of crossover children we perform a roulette method. During this step, we simulate a roulette wheel, in which the section area of the wheel corresponding to an individual is proportional to the individual's expectation .
In a population of size , for an individual of fitness value , its expectation is computed as follows: With:

∑
The method generates, then, a random number in the interval [ ] The individual whose segment spans the random number is chosen. This process is repeated until the preferred number of children to be created is reached.

D. Crossover Method
Crossover is a basic operator used in GA for producing new children which will have some parts of both parent"s genetic material. In the proposed approach a scattered crossover technique is used. For a child creation we: 1) Select 2 parents by the use of the selection method.
2) Generate a random binary vector of length ( : the length of a chromosome). www.ijacsa.thesai.org

3)
To form the child, we use the gene from the first parent if the vector value is 1 and the gene from the second parent if the vector value is 0.

E. Mutation Method
Mutation method is applied on children created by crossover mechanism. The genetic algorithm applies small random changes in each child. Mutation offers population diversity and allows the genetic algorithm to explore a broader space.
Our mutation method is a two-step process: 1) Randomly select the fraction of the child vector to be mutated according to a probability rate . In practice is often small. This small value guarantees that the mutation operator preserves the good properties of a chromosome while exploring new possibilities. In our experimentation we choose a equal to . 2) Replace each selected entry (chromosome gene) by a random number chosen from its corresponding range.

F. Termination of the GA
A GA approach is known to be a stochastic search method. The specification of a convergence criteria is sometimes problematic as the fitness function value may remain unchanged for a number of generations before a superior individual is found. In our approach, we choose to terminate the GA process after an already specified number of generations to avoid materials saturation. Then, we verify the quality of the best individual fitness and if necessary we restart the GA process with the initialization of a fresh search. The GA Fitness Function.
In this work we are optimizing CNN hyper-parameters for image classification task. Each individual corresponds to a plausible configuration of a CNN. It specifies its number of filters and their respective sizes. The fitness function or of an individual is computed via the CNN training from scratch based on an input dataset . In our approach, the classification error is used as individual score.
The input dataset is divided in a and .
The of training is computed as: The individual

G. Designed Algorithms
According to the previously exposed GA methods, the E-CNN-MP main algorithm is described in Algorithm 1. This algorithm returns the best individual and the corresponding CNN for which we save all weights and biases.
The FitnessCNN function used in the main program is the GA fitness function. It is described in Algorithm 2. It aims to construct the CNN model according to individual"s values generated through the GA process and then operates a from scratch training.
Methods used for successor generations creation are described in algorithm3. ; //choose the second parent gene to be conserved in the child end if; end for; /*Mutation operation*/ randomly select child gene position i to be mutated: i random (1,length (C)); randomly select x from the child gene initial range; C(i) x; end for; /*Randomly generation of remaining individuals in P"*/ N"" N-(E+N"); Inject randomly N"" chromosomes in P" (To guaranty diversity); Return P";

VI. SIMULATIONS AND RESULTS OF E-CNN-MP
The proposed approach for CNN hyper-parameters optimization is executed in a single GPU. The setting of the GA global parameters used for method implementation is synthetized in Table II. To search for the best CNN model for our particular classification task, the GA process is iterated 5 times and the GA result performance is verified. Table.III shows the individual scores during each GA process. Simulation results are summarized in Table. IV. The best individual reached by the GA process is a CNN model offering 89.47% accuracy which is not bad. In addition, comparing to the manually designed CNN (Section IV, Table I), it gives better classification accuracy.
Despite this accuracy improvement, we notice that the GA individual scores oscillate a lot through generations giving an accuracy average not exceeding 50%.
In the following step, the CNN training process is improved to get better classification accuracy values.  Many methods are proposed to optimize a deep neural network training. Some of them try to reduce the sensitivity to network initialization. Among these enhanced initialization schemes, we quote: Xavier Initialization [45], Theoretically Derived Adaptable Initialization [46], Standard Fixed Initialization [6], etc. However, researches demonstrate that these methods have some limits. In fact, Xavier Initialization is not suited for rectification-based nonlinear activations CNN. Although, Theoretically Derived Adaptable Initialization method improves convergence characteristics, it is not confirmed that it led to better accuracy. It is also proven that Standard Fixed Initialization delays convergence because of the gradients magnitude or activations in a deep network final layers [7,47].
Other methods are interested to reduce the internal covariate shift phenomenon produced by variations in the distribution of each layer"s inputs due to parameters changes in the previous layer. An "internal covariate shift" problem can dramatically affect CNN training [48]. In fact, during training process when the data is flowing through the CNN, their values are adjusted by the weights and parameters. This procedure makes sometimes the data too big or too small. To largely avoid this problem, the idea is to normalize the data in each mini-batch and not only for input data during the preprocessing step [48]. For each input channel across a mini-batch, activation normalization is, first, performed by subtracting the mini-batch mean and dividing by the mini-batch standard deviation. Input is, then, shifted by a learnable offset β and scaled by a learnable scale factor γ.
To enhance the designed E-CNN-MP approach, a CNN batch normalization is adopted. Comparing to Fig. 6, training accuracy results presented in Fig. 8 show that merely adding batch normalization to our CNN model yields a considerable speedup and achieves higher classification accuracy.

VIII. CONCLUSIONS
In this work CNN is investigated as an image classification method. The performance of such network depends on the setting of its hyper parameters such as the number of convolutional layers, the number of filters per layer and their respective sizes. Our particular problem is to generate a predictive model for sign stop image classification by the use of CNN. First we try to manually design the CNN structure. This approach gives a very poor classification accuracy (30%).
As the number of candidate solutions is very large we develop a E-CNN-MP framework, based on GA methods, to search for a best CNN structure. This heuristic method, starts by creating an initial population of potential CNN structure and then evaluates each individual by a "from scratch classification error computing". For all CNN training we use a reference dataset. During simulations the network is set as a block of convolutional, ReLU and Maxpooling layers. Simulations prove the ability of the GA process to search for the Elite CNN model. This model offers a classification accuracy  90%. Once we test the well doing of the GA process we try to improve the framework to get better results. Hence, we choose to insert a batch normalization layer after each convolutional layer to improve the quality of the network training. The Enhanced E-CNN-MP performs better than the first designed one. GA simulations allow us to get a pre trained CNN performing an accuracy of 98.94%.
In this article we propose a competitor strategy using the GAs to search for a best CNN structure offering a high-quality pre-trained CNN suitable for stop sign image classification. The designed Enhanced E-CNN MP approach is an innovative approach.
Our contribution will allow scientists from any field of research (biology, medicine, robotic, geology ...) to design their own Convolutional Neural Network (CNN) prediction model suitable for their particular image classification problem.