Evolutionary Strategy of Chromosomal RSOM Model on Chip for Phonemes Recognition

This paper aims to contribute in modeling and implementation, over a system on chip SoC, of a powerful technique for phonemes recognition in continuous speech. A neural model known by its efficiency in static data recognition, named SOM for self organization map, is developed into a recurrent model to incorporate the temporal aspect in these applications. The obtained model RSOM will subsequently introduced to ensure the diversification of the genetic algorithm GA populations to expand even more the search space and optimize the obtained results. We assigned a chromosomal vision to this model in an effort to improve the information recognition rate. Keywords—Information recognition; Recurrent SOM; Chromosomal RSOM model; Evolutionary RSOM; Implementation over SoC


II. THE SOM CHOSEN MODEL FOR PHONEMES RECOGNITION 1) Adopted Approach
The Experiments of phonemic recognition by SOM are operated on a multi-speakers speech basis.The process is as follows:  Reading and segmenting the speaker's speech into 10ms samples to benefit from these stationary supports.
 Transform these sound supports into codes vectors consisting of MFCCs coefficients.Some approaches use the MFCCs of central windows of phonemes where energy is considered maximum [1].
In our approach, we considered the MFCCs of all windows of phonemes where all the energy will be presented.The choice of MFCC coefficients in determining the acoustic vectors, depends on comparative studies applied on phoneme recognition rate, can be generated under different characteristic parameters of a signal.The following example shows the scores on a test basis of 100 phonemes of TIMIT database realized by Bruno Gas in his habilitation at the University of Paris 6 in 2005. Conversion of phonemic data related to each class into a data structure 'sD' coherent in dimension to the SOM map structure.This step is very important because it adapts the input data to be accepted by the SOM map.www.ijacsa.thesai.org Eventually, we train the SOM map, after study and choice of parameters such as size, network topology ... etc..
 Finally, we repeat the trainings of the SOM map through a TIMIT test basis which is specific and contain fewer samples.In this step, we train the SOM map on TIMIT training basis and we do the test of phonemic recognition program on the test data structure, to obtain the generalization rate.The notion of the SOM map generalization is based on the idea that to learn well data does not the fact of learning by heart, but it must be able to perform well before any points and in variables situations [3].

2) Dynamic extension of SOM
The temporal data processing is a very important task, for which there is no unified approaches.In RSOM algorithm, each prototype vector has a weight representing its position in the distribution, and also has another context vector representing the activation of the whole map in the previous iteration.The selection of the closest prototype in this case is based on a distance making account, on the one part, the difference between the data and the prototype weight and in the other part, the difference between the previous context and the actual context of the prototype.The prototype update requires changing the weight and context behind the winner prototype and its neighbors [3].This idea was developed by Thomas Voegtlin in 2000, and it is represented as below in our phonemic recognition strategy.Also, this approach was deducted from the Elman SRN (Single Recurrent Network) structure invented in 1990.The SRN is a modified perceptron with one hidden layer, using a delayed copy of the activities of its hidden layer as an additional input.His task is to learn associations of input / output sequences.He trained with the error propagation algorithm (see fig. 3): The representation in the layer context of an SRN is the same as in its hidden layer.Therefore, we say that the hidden layer learns to represent his past activities, as it will receive each time its past activities.In this sense, the representation in the hidden layer is called self referent.This self reference will influence learning by acting on the error function between desired output and actual output located at a definite level of iteration [4].The time constant of the RC circuit is expressed in seconds.This filter circuit added to an operational amplifier is referred to as integrator, which introduces the time factor, by effect of charge and discharge time constant T = RC.
The capacitor is initially discharged: Uc (0) = 0 volts.The discharge equation of initially charged capacitor to the potential value E is given by: U c (t) = E.e -t/RC (1)

3) Experimentation and Results
We are interested in a speech recognition tool with multi speakers regardless of the context.This requires having at our disposal a large amount of vocabulary, for learning and recognition in continuous.
There are currently many databases that words were recorded especially in English.Our research focuses on the application of SOM and RSOM for phonemes recognition in TIMIT databases.The wide dissemination of this base in the international community allows an objective assessment of performance.This corpus of speech data called: DARPA TIMIT was prepared at the National Institute of Standards and Technology (NIST) funded by Defense Advanced Research Projects Agency Information Science and Technology Office, (DARPA-ISTO) to study the acoustic variability of American English on different dialects and different regions for multiple www.ijacsa.thesai.orgspeakers.These dialects are referred by 8 directories DR1, DR2, ..., DR8, which contain the records of 630 American speakers (from the U.S.) saying 10 sentences each.
The total vocabulary base is 6300 sentences, shared between 630 speakers, including 438 men and 192 women, as follows:  462 speakers were including 326 men and 136 women, for the learning set.
 168 speakers were including 112 men and 56 women for the test set.
This database contains a phonemic segmentation and accurate labeling that affect learning models.
The proposed method gave phonemic recognition results, envisaged at the table below: These experiments show that the ability of Kohonen algorithm depends on many parameters such as the vector input dimension, the SOM map dimension, the iteration numbers, the speech's sample numbers taken in training or in test stages, and the phoneme classes.It means that each neural unit in the map specializes in a particular kind of data.This model gives a result of generalization rate 61.66% compared to a training rate of 82.96%.This result seems trapped in a local maximum.It is then necessary to extend our research space.
We propose then to introduce dynamic recurrent loops over the map to integrate the time aspect.Eventually, we will offer a possible hybridization schematic of the SOM map with the genetic algorithm GA.
The results are given by the both comparative figures:

III. ADOPTED STRATEGY FOR AN EVOLUTIONARY RSOM
The basic strategy of the GA RSOM concept that we proposed for raising the phonemic recognition rate revolves around the following main ideas:  Establish recurrence loops on the neuronal map SOM designed to collect and recognize static data; each neural unit of the map is considered as static combinatory circuit.The recurrence loop can integrate the memory effect to every cell, so the integration of the temporal aspect during processing of received data.This latter provides a dynamic quality for each neuron that will be compatible with the pattern shape of each phonemic support variability to be recognized.
 The resulting model RSOM will have a certain time diversity and a certain winner neural diversity taking into account certain parameters such as the number of phonemic data inputs, the number of neurons in the RSOM map that determines the frequency of the phonemic input positions, the extent of the neighborhood function that represents the bandwidth www.ijacsa.thesai.org enlargement of the selective filter BP for different feature vectors corresponding to phonemes, and the number of iterations during the learning phase or network test.
 The iteration will end with a BMU: a winner neuron that specializes in feature vectors sequentially provided to the RSOM network input.
 Each BMU representing a phoneme characteristic vector is considered as a chromosome vector.This vector carries the singularity of individual traits giving a diverse population of the RSOM maps.
 The diversity of BMU samples promotes the field to apply the genetic algorithm GA over the different speech phonemes.This allows extending the search space and avoiding to be trapped in a local optimum solution by confirming the survival and the recognition for the best phonemic individual.[5], [6], [7], [8].
This strategy is described following the below diagram: This idea is abstracted on the following algorithm.Each individual is assigned to one obtained BMU over iteration.It will be represented by a concatenated MFCC vector corresponding to one phonemic chromosome related to an RSOM map [9].

IV. THE GA-RSOM PARADIGM
If you choose effective parents, it is very probably that the offspring have an efficacy at least as important as their parents; it is the selection and crossing principle of species to assure the survival at best.In this way, the application of genetic algorithm on maps RSOM for phonemic recognition, performs a global search for solutions by avoiding local minima and can estimate many parameters varying in ranges of important values.
In our approach, the data are necessarily temporal and the recognition tool is recurrent Kohonen map RSOM which the time factor is introduced by its differential equation.
The genetic evolution of the map optimizes recognition rate in a search space as large as possible.The research will be guided by a cost function associated to the developed model and reflects the individuals' effectiveness in a given generation.The chosen function is described by the following expression [10], [11], [12]: (2) X i (t) represents the characteristic coefficients of the input data vector.W i (t) represents the synaptic weight vector of each neuron.
This objective function provides a means of evaluating scores of individuals in a generation.It is between 0 and 1, and is even greater than the map weight is close to the data input.That is to say the difference between the input observations and the output solutions will be reduced.

1) The selection technique
The individuals' selection for reproduction is made by random bringing following the given distributions by the Fitness function, more the fitness of a chromosome solution is good, more it is closer to 1, and more the chance of bringing it at random is higher.The random selection is made according to the empirical distribution of the relative fitness of individuals.The selection algorithm is presented as follows: Where, fitness (j) is the score of candidate j; N is the candidates number.The quantity Ps(i) is situated between 0 and 1.We select randomly the 0 and 1, then we chose the candidate i number (n Ps) between such as: The selection must be served to the cross and mutate operation.[13], [14].

2) Algorithm of the GA-RSOM proposed Model
The speech is constituted by a phoneme set.Every phoneme represents a sound atom characterized by certain stationarity.This specificity limits the ability of the RSOM tool which incorporates the temporal aspect in phonemes recognition.To overcome this constraint, we tried to consider, in our experiments, all the phonemic support windows instead of taking the central window where concentrates the maximum energy of the signal.This solution secures all information of a phoneme even in adverse environmental conditions.Similarly, the consideration and implementation of a recurrent dynamic model such as GA-RSOM for phonemes recognition allows better identifying the variability of speech.This idea opens another way of research such as the recognition of an isolated word or some keywords, where recursive models are very interesting.
Else, our evolutionary model around the RSOM map has the principle of an adaptative tool; it is scalable in the objective of optimizing the obtained recognition results.The tracking algorithm begins by creating an initial population which consists of initializing the neural weights of a developed RSOM map.Sequences of serial data will be provided in the entrances for their identification.Each phonemic coded vector will transmitted towards each neuron of the considered map.After a learning or test phase, each neural unit specializes in one type of input vectors having the closest form.
Our evolutionary model takes advantage of the diversity of neuronal units winners BMU obtained during the iterations of the RSOM maps.
The diversity of the BMU constitutes chromosome diversity for different individuals of an RSOM population.This diversity offers more chance to expand the search space and to have the best descendants may participate in ensuring the survival to the best along the generations until a stopping criterion which result in an optimization of an improve recognition rate.
The pseudo-code of our evolutionary algorithm revolves around the following points:

a) Linear initialization of the RSOM network to create an initial population. b) Admission sequential of phonemic feature vectors to the RSOM map. c) Prototyping of BMU representing individuals in the population. d) Computing the individual scores seeing a fitness function. e) Applying a geometric selection to cross parents. f) Computing the genes quality of each parent. g) Applying the crossover between parents to obtain a child. h) Mutate every child with certain probability. i) Looping from selection step until the production of new population step. j) Define an individual's 'BMU' through iteration. k) Computing the winning recognition rate. l) Check the algorithm stopping criterion. 3) Experiment results
The experimentation of our proposed model is performed on phonemic class richest in energy of vocal cords vibration; containing twenty vowels extracted from TIMIT database.A comparative study was established under different phonemic recognition tools such as SOM, GA-SOM and GA-RSOM to target the most appropriate model to optimize the phonemic recognition rate.
These values are listed in the two following tables then represented by figure 9 and figure 10.However, the application of recursive evolutionary GA-RSOM model promises more in the results as it integrates dynamic aspect comparably to the variability of phonemic support. V.
MODELING SOM ON FPGA 1) Similarity factors between the SOM map and the FPGA An FPGA is an integrated circuit, which based on configurable logic bloc (CLB), Programmable connection matrix and RAM blocks to implement complex digital computations [15], [16], [17].CLB is a configurable element, permit to FPGA to be a configurable tool and to be used for hardware verification.Each CLB, which characterized by its logic architecture containing a combinatorial static part and a dynamic part used for memorizing can be simulated for a neuron, which featured by its effects memory, erase and re memorization skills; This specificity allows us to think of shaping CLBs in a way to mimic the behavior of the SOM map.The main feature of Kohonen neural network manifests in that each neuron of it is connected to all other neurons.In parallel on FPGA, each Configurable Logic Bloc can be linked to all other CLBs owing to the programmable connection matrix.
Even the algorithm of Kohonen is suitable to the FPGA due to the fact that both of it work on cycled mode.Therefore, iteration of Kohonen algorithm will be a clock cycle on the FPGA achievement.

2) Modeling strategy of SOM on FPGA
This work will start by modeling and implementing an artificial neuron in hardware, which will be a Xilinx FPGA.Secondly, it will proceed to modeling the whole SOM and implement it on FPGA.The SOM will be basically composed of one unit artificial neuron either called processor.It has some inputs, which will represent entries that mimic the dendrites of a biological neuron and an output that mimics the axon and which serves to spread information to other neurons.The cells of the competitive neural layer of SOM are grouped according to their learning similarities.Therefore, the cells are sorted so that their neighbor has almost the same characteristics.
The operating principle of each artificial neuron is described as follows by this algorithm.After this algorithm accomplished by each neuron within a SOM map, all activate neurons were compared sequentially one by one to the input vector.
Therefore, the neuron which best represents the input vector is the winner called the BMU for Best Matching Unit in a considered iteration.It is the neuron having the minimum of Euclidian distance between the input vector coefficients and the vector of weight codebook at the beginning of Soma.This idea will be clarified by the following algorithm.The behavior of our modeled system of SOM will be coded on FPGA.Thereby, because the waveform which happens to stimulate a biological neuron is represented by spikes, so we will use square pulse as it is shown in Figure 15 and as it is a digital signal which is suitable to be processed by every CLB within the FPGA.Fig. 15.Representation of spikes by square pulses to be processed by neurons When a spike arrives, the soma has two functions; to generate the potential action according to the input data and to compare, if the addition of all potential actions in this instance is over a threshold, in that case, it will have to generate a pulse through the axon.
In order to facilitate the design task for neuron modeling, the method "top / down" is applied.
The way to cope with the problem is using a top-down method that consists to divide a complex design in easier designs or modules; each module is redefined with more details or divided in subsystems.For that, a general idea over the system is made in the first view, and if you go down through the subsystems, you can see with more detail how each block works.In the developed system, each artificial neuron will have only three dendrites as inputs to facilitate the tasks.For processing information and analog signals, these intake dendrites must be preceded by an analog / digital converter ADC.The received pulses are therefore a square wave as defined previously.
The received binary signal is weighted by the weight of each dendrite to be summing and thresholding using an activation function that is defined in the operation of the neuron unit algorithm.www.ijacsa.thesai.orgIn our system, the SOM map is represented by four neuronal units on four D flip-flop.The four outputs are compared using a comparator.The first activated D flip-flop will be considered the Best Matching Unit BMU.She is the one with the closest information of the input vector.
In a deeper analysis, we will include other important blocks.We have to introduce more inputs to get the system can learn quickly.Also, it is necessary to introduce a clock signal in the system.All the signals will be digitals, for that, it is recommended a signal that synchronizes the global system.
The implementation of the FPGA device under Xilinx follows this diagram.The specification step comprises the choice of the logic device using a syntactic specification in the language VHDL, or Verilog, or in graphics mode.
The functional validation is based on a functional simulation of the concept.This is to see problems of inputs/outputs, loops, etc.This audit does not take into account the temporal aspects of the Device.
The temporal validation includes the temporal and functional simulation of the created device on FPGA, such as the propagation time, the signal overlap, etc.
At the implementation stage, the program will be brought physically on the created circuit, as a project on FPGA, according to the specifications specified by the programmer in the light of pins allocation and internal behavior.

VI. SIMULATION RESULTS
Our experiment is based on the development of an FPGA type cyclone II over the Xilinx software.Then we created the SOM map adopted model on this FPGA by applying the VHDL language and following the design protocol developed in previous Section.The simulation result of a SOM map on FPGA, made of four neural processor units, is established at the following two figures.
During this simulation, we took all the weight w1, w2 and w3 to a value of 1. Thus; a period of 10 ms is allocated for the clock signal (clk).
Similarly, we chose different periods for i1, i2 and i3 which are the three dendrites of a neuron.S1, S2, S3 and S4 are the neurons outputs.We find in the first marker at 15 ms from Figure 22 that although i1 equal to 1 and the clock is at the leading edge, the output S1 remains at 0 because e1 is not enabled.
In the second marker at 55 ms, we note that the first dendrite is excited (i1 = 1), the clock is also at the leading edge and e1 is enabled, this implied a generated result on the signal S1.
This simulation shows that at 115 ms the sum of 3 signals S1, S2 and S3 doesn't pass the neuron threshold (threshold = 0.5), which means the exit "Sout" remains 0, Against, by the next rising edge at 125 ms, we find that the sum of the latter outputs exceeds the neuron threshold, which shows at Figure 23 the activation of a neuron that called the BMU and "Sout" became equal to 1. www.ijacsa.thesai.org

VII. CONCLUSION
In this paper, we have developed an evolutionary model GA-RSOM.His experimentation gives promising results (table 2 and table 3) by appearing to mean recognition rate for other models such as the SOM, the GHSOM, and GA-SOM for static data.
By applying a recursive loop on SOM we could introduce the dynamic temporal aspect of this model RSOM.
Similarly, we considered the best matching unit BMU, obtained in each RSOM iteration, as a chromosome bearing the characteristics of an individual selected from a diverse population by application of genetic algorithm GA.

Fig. 1 .
Fig. 1.Recognition Rates over different types of coding [2]. Classification of the MFCC coefficients into lists of macro classes. Determination of phonemes list from each class.

Fig. 3 .
Fig. 3. Principle of Elman network structure SRN with feedback loop introducing the idea of self reference Thus, the output of each neuron in the output layer can be modeled by a leaky integrator based on an active low pass filter which integrates the temporal aspect by operation of charging and discharging of the storage capacity.

Fig. 4 .
Fig. 4. Modeling of neuron outputs in the RSOM map by an electrical lowpass filter

Fig. 8 .
Fig. 8.The UML Unified Modeling Language for adopted Strategy a) Calculate the fitness of each candidate at the selection.b) For each candidate i, we associate the value: ijacsa.thesai.org

Fig. 9 .
Fig. 9. Comparative Rates between Models for Vowels Test

Fig. 11 .
Fig. 11.Comparative block diagrams between the SOM and FPGA

Fig. 13 .
Fig. 13.The UML Diagram of an artificial neuron behavior

Fig. 14 .
Fig. 14.The UML Diagram of a SOM map behavior

Fig. 16 .
Fig. 16.A basic modeling of neuron Each neuron unit is compared to a cell memory; which likes a D (Data) flip-flop.It is therefore considered as a processor unit that can handle only binary information.

Fig. 17 .
Fig. 17.Modeling different ports I / O of a formal neuronEach neuron is modeled by a block containing multipliers floors of sample coefficients related to the input signal by the appropriate weights to each dendrite; followed by a block as the sum of the obtained products and a threshold ensuring the neuron activation decision.This will be illustrated by the following figure.

Fig. 18 .
Fig. 18.Representation of the neuron's internal behavior The output Y from previous diagram is considered as the input of a D (Data) flip-flop.

Fig. 19 .
Fig. 19.Modeling of the SOM map by the D flip-flops and a comparator

Fig. 21 .
Fig. 21.The schematic of FPGA type Cyclone II

Fig. 22 .
Fig. 22.The simulation result of SOM on FPGA without activated neuron

TABLE I .
RPECOGNITION RATE OVER SOM FOR TIMIT DATABASE PHONEMES

TABLE II .
VOWELS RECOGNITION RATE FOR TIMIT TRAINING BASIS

TABLE III .
VOWELS RECOGNITION RATE FOR TIMIT TEST BASIS Fig. 10.Explanatory Diagram of Comparative Rates between Models These results show that the model SOM is limited to static data.While hybridization of SOM by the GA gives a slight improvement in recognition rate because it is always around a model of SOM core which can handle only static data.