Evolutionary Method of Population Classification According to Level of Social Resilience

Following the many natural disasters and global socio-economic upheavals of the 21st century, the concept of resilience is increasingly the subject of much research aimed at finding appropriate responses to these traumas. However, most existing work on resilience is limited to a broad cross-disciplinary panel of non-operational theoretical approaches. Thus, the study of the processes of social resilience is confronted with difficulties of modeling and a lack of appropriate analysis tools. However, the existing stratification methods are too general to take into account the specificities of the resilience and are difficult to use for non-specialists in modeling. In addition, most traditional methods of partition research have limitations including their inability to effectively exploit the research space. In this paper, we propose a classification algorithm based on the technique of genetic algorithms and adapted to the context of social resilience. Our objective function, after penalization by two criteria, allows to explore widely the space of research for solutions while favoring classes quite homogeneous and well separated between them. Keywords—genetic algorithm; Unsupervised classification; social resilience; Partitioning method


INTRODUCTION
Resilience is a polysemic concept that is studied in several fields including sociology, ecology, economics, computer science and psychology.Resulting from the physics of materials where it designates the ability of a system to resume its initial equilibrium after a deformation, resilience is the segment of many researches these days.However, analysis of the literature in this area reveals a lack of operational approaches.This paper is a contribution to the process of operationalizing the concept of social resilience that is defined by the French ethologist Boris Cyrulnik as the ability of a person, a social group or an environment to overcome suffering or trauma [1].
One of the fundamental principles of clustering is to ensure the partitioning of a set of objects so that the elements of the same group are as similar as possible and that the various groups are distinct among themselves.There are several families of classification methods, the most widely used of which are hierarchical classification methods and partitioning methods.These methods, however, present a certain number of not inconsiderable drawbacks.In effect, hierarchical or agglomerative methods are limited to small sets of sizes due to the fact that they store in memory a dissimilarity matrix whose size is quadratic as a function of the number of vertices.As for partition-based classification methods, in addition to generating sub-optimal results dependent on the initial partition, they exploit only a small part of the solution search space.This calls for the need to develop other methods offering more possibilities for exploring this research space.The genetic algorithms developed by John Holland [2] respond to this concern.Indeed, these algorithms, inspired by the principles of the neo-Darwinian natural evolution are known for their effectiveness in exploring quite large and complex research spaces.They generally allow to generate good solutions following the application of a cycle of operations (selection, crossing, mutation).One of the interests of our proposal is its ability to identify different dominant characteristic groups in a given population.It thus adapts well to the context of social resilience [3] especially in the study of social stratification within a population victim of a traumatic shock.In other words, an application of this situation could be the identification of the social groupings of a population according to the degrees of resilience of the different individuals facing a traumatic shock.
In this paper, after a presentation of the genetic algorithms and some work done, we present our proposition followed by a conclusion.

A. Principle
Genetic algorithms are part of the stochastic optimization algorithms [4] [5].They represent a modeling of natural evolution to solve a research problem.Their goal is to evolve a set of solutions towards an optimal solution.To do so, the algorithm randomly generates a population of individuals (chromosomes) and proceeds by successive iteration to generate new individuals by applying different selection, crossover and mutation operators until reaching a stop criterion.An evaluation function makes it possible to evaluate beforehand each chromosome candidate for the selection.As a result of the evaluation, a sub-population which is victorious of chromosomes is retained for reproduction.The crossover and mutation operations are carried out respectively according to a crossing probability (Pc) and a mutation probability (Pm).

B. Genetic operators
-The selection operator: It allows parents to be chosen for reproduction according to an evaluation function called fitness.Generally, for a population of n individuals, n / 2 is selected for reproduction through the crossing step.We distinguish several techniques of selection in the literature of which the most used are the technique of roulette or the "roulette-wheel", the technique of the tournament, the technique of the rank (ranking ) and the universal stochastic selection [6][7] [8].
-The crossover operator: This operator makes it possible to cross the 2 n pre-selected parents to generate new children who have characteristics of their parents.It thus complements the population of 2 n individuals to n individuals.The crossing is done according to a probability c P which increases with the number of cross points.Three main crossover operators are distinguished: crossing at one point, crossing at n-points (n ≥ 2) and uniform crossing.
-The mutation operator: This operation consists in modifying, randomly, the value of an allele following a mutation probability m P , which is generally very low.A too high mutation probability could lead to a suboptimal solution.
III. STATE OF THE ART M. Merzougui et al. [9] propose an improvement of the unsupervised classification algorithm Isodata through its main parameters.Indeed, because the results of "Isodata" are intrinsically linked to a threshold from which a class is divided and another threshold from which two classes are merged, the authors use the genetic algorithms to determine these two optimal thresholds.This has improved the quality of this algorithm.However, other parameters are empirically fixed, such as the bounds of the chromosome membership interval of the initial population.This helps to always influence the results of the algorithm despite some performance.
Stephane Legrand [10] proposes a genetic program to discover subsets of homogeneous and distinct data in a file called "Zoo".Thus, it represents an individual in the form of a tree of logical formulas.Each logical formula consists of a variable number of predicates.It evaluates the individuals from an evaluation function based on a measure of homogeneity (H) and a measure of the separability (S) of the data subsets and equal to: fitness H S µ = + . It applies a coefficient µ to the measurement of separability in order to vary the relative weight of the two measurements.It considers the homogeneity H as the weighted average of the homogeneity of the various subsets and the separability S as the weighted average of the distances between the centroids of the subsets.The convergence of the algorithm is not proved.Moreover, the arbitrary choice of the coefficient µ greatly influences the quality of the results.
Maulik et al. [11] propose a clustering method based on a genetic algorithm in which each element is assigned to the nearest centroid so as to form clusters.Each time, the centroids are recalculated as the average of the elements of the same group and the inverse of the intra-group inertia is then calculated to reduce to a maximization problem.The authors use a representation of the individuals in the form of k tuples and encode the coordinates of the k centroids by real numbers.Initially, they initialize an initial population of P chromosomes randomly.Moreover, the selection technique used is an elitist proportional castor, which allows to retain the best candidate of the previous generation.Unlike the previous algorithm, it converge towards the global optimum.However, it does not solve the question of non-consistent classes (having one element) and separability between classes.
Greene [12] proposes a method that generates hierarchies of partitions.It begins with a top-down method by which the initial population is subdivided into several subpopulations.Evaluation consists in optimizing a function dependent on intra-group and inter-group inertia and on the size of the constituted groups.To limit the influence of initial conditions including the order of insertion of objects in the tree, the author proposes to generate the best possible tree by applying a genetic algorithm.An initial population of trees is generated by choosing a random order of insertion of the objects.The different selection, crossing and mutation operators are applied.The selection is made by the elitist proportional roller technique where the two best solutions are retained after evaluating the quality of each tree.For crossing, it chooses the best branches of the first level of each tree.The algorithm takes into account any objects that are repeated in two classes or missing in the partition.In the first case, the object is maintained in the best class and in the second case, it is simply reinserted.This algorithm unfortunately does not provide information on the optimality of the generated solution.

A. Motivations
In order to study the processes of social resilience, researchers often use classification methods that are often poorly adapted to this domain because they do not respect certain specificities linked to the concept, particularly its unobservable, temporal and dynamic aspect.
Moreover, the most widely used classification methods present a certain number of notable inconveniences including their inadequacy to large data sets (for hierarchical algorithms) and the very limited exploitation of the solution search space (For partitioning algorithms).All these limitations can contribute to biased results.Thus, we propose to develop a partitioning method hybridized with the technique of genetic algorithms for the classification of data of social resilience.This method, in addition to taking into account the specificities of social resilience, has the ability to explore a large solution-seeking space and can be applied to larger sets of data.In addition, it can be adapted to any field of study.In this paper, the algorithm is applied to a real data set, obtained from a survey of a sample of people in relation to the recent postelectoral crisis in Ivory Coast.The objective is to find the main sociological groupings caused by the trauma of this crisis within the population studied.In a broader case study, the results of our algorithm can be used by the actors to facilitate the making of certain decisions in favor of the resilience of the traumatized individuals.

B. Notation
n : The number of objects to be classified; T : The time horizon for estimating the resilience of individuals; Q : The total number of classes; : t T Ω : Set of objects to be classified according to the information collected over the period from t to T; ( ) Pop t : Population of individuals (chromosomes) at time t;   δ : The overall penalty rates; A : All classes whose size is less than or equal to 1; B : All non-homogeneous classes; ( ) card A : Cardinality of the set A.

C. Representation of Individuals
An individual is a class partition and is a potential solution to the problem.In the context of genetic algorithms, it is represented by a chromosome composed of genes.Each gene represents a class and consists of a sequence of binary digits (0, 1).In this paper, we use a presence / absence coding where the presence of an object in a class is marked by the number 1 and its absence by the number 0.

Example of coding of our chromosome:
Either a given set of 12

D. Our evaluation function
In order to obtain homogeneous classes, we propose an evaluation function which minimizes the ratio of intra class inertia by total inertia.It is as follows: Since the classification often leads to empty classes or classes containing a single element, we propose to penalize the above evaluation function by a rate which is the percentage of classes whose numbers are less than or equal to one.Moreover, in order to obtain homogeneous classes well separated from each other, we propose to penalize also the objective function by the percentage of classes whose class centers are relatively close.We obtain the global penalization rate δ such as: Therefore, the penalized objective function is calculated as follows: ( ) ( ) ( )

E. The choice of parameters
• For the selection, we use the roulette method which is similar to a lottery wheel on which each individual is represented by a sector equivalent to his fitness value.At each turn of the wheel, each individual has a probability of being selected proportional to its fitness value : ( ) ( ) / ( ) 1 • For the crossing of the individuals, we use the crossing at a point of cut chosen randomly among the 1 l − possible points ( l representing the length of a chromosome).At this level, we choose a crossing probability as advocated by Goldberg [13].In our case, 0, 6 P cr = • For the mutation, we opt for a mutation probability inversely proportional to the size of our population, i.e. 0, 08 mut P = .
• As criterion for stopping our algorithm, we retain the maximum number of iterations (or generations) fixed.

G. Results and interpretations
For the application of our algorithm, we use a real data set, obtained from a survey of a sample of one hundred (100) individuals (see Table 1).This survey relates to the trauma caused by the recent post-election crisis in these people.The objective is to identify the significant groupings that can be obtained from this population in order to make decisions.After simulations, it appears that the best classification result is obtained for 3 classes with a Rand index of 0.89 after 150 iterations (generations) (see Table   The following figures show the best groupings obtained respectively for 3, 4, 5 and 6 classes.V. COMPARISON OF OUR PROPOSAL WITH OTHER WORKS These hybrid algorithms are very different which makes them very difficult to compare.However, in the table below, we present some points of comparison.We proposed a hybrid-partitioning algorithm for the identification of significant groups as a function of the levels of resilience.It generates from a traditional method of partitioning partitions, which are then optimized using the technique of genetic algorithms to give the best partition possible: one that minimizes the most intra-class inertia and promotes classes while eliminating classes that have only one element. The results of our simulations showed that the algorithm converges after 150 iterations by providing a solution corresponding to the expected objective.The Rand index (0.89) obtained without doubt translates the good performance of our algorithm.In future work, we intend to extend this algorithm to other areas of study other than social resilience to test its robustness.
t i ξ : Estimation of the resilience of the individual at time t.

Ω
at time t; K : Population size (Number of partitions);M : Maximum number of iteration (generation); Evaluation value of the individual t I ; q g : Center of gravity of the class q C ; g : Center of gravity of the whole point cloud; d : Euclidean distance; α δ : Percentage of classes whose numbers are less than 1 (minimum number); β δ : Percentage of classes with closely spaced classes;

2
).According to this classification, 18 individuals are in the first class, 32 individuals are in the second class and the other 50 individuals are in the third class.

TABLE I .
EXTRACT FROM THE DATABASE USED

TABLE III .
COMPARATIVE TABLE OF OUR ALGORITHM (ALGOGENE) WITH OTHER WORKS