Integrated Pairwise Testing based Genetic Algorithm for Test Optimization

Generation of Test cases in software testing is an important and a complex activity as it deals with diversified range of inputs. Fundamentally, test case generation is considered to be a multi-objective problem as it aims to cover many targets. Deriving test cases for the Web Applications has become critical to the most of the enterprises. In this paper, a solution for generating test cases for web applications is proposed; the solution uses the System Graph (consisting of links and data dependencies) considering that test cases were based on a combination of input values and data dependencies. Pairwise testing is used to derive the test cases to be executing from entire test cases and then a genetic algorithm is proposed to generate test cases specific to functional testing. The proposed approach was tested through two distinct experiments by measuring the code coverage at every generation and results show that genetic algorithm used increased the fitness value and code coverage. Overall, the results of the paper validate the proposed approach and algorithm, having potential in further construct an automated integrated solution for generating test cases for the entire process. Keywords—Test case generation; genetic algorithm; multi objective optimization; pairwise testing; test optimization; fitness value


I. INTRODUCTION
The key point regarding the usage of Soft Computing in testing is towards maximizing the quality of software testing and to automate the test generation process. The search issue is all about finding perfect results from a list of adversary results, which is handled by a fitness function to identify results. Software Testing is not just limited to the testing of an application or system but includes checking entirety of a system. Genetic Programming (GP) [1] is a type of Evolutionary Algorithm which is simulated by biological growth to search programs that perform certain user-defined tasks. This programming technique has been successfully applied to many fatigue problems present in software testing such as instinctive design, pattern recognition, and test suit generation [2].This suit of algorithms helps to automate the generation of basic test paths which includes several problems like data generation, sequence generation, test case derivation, and optimization. Recent studies stated a regeneration genetic algorithm which is proven to be operative and trivial for coverage-oriented software test suit generation. Issues related to software reusability can be resolved by the grouping of soft computing approaches like neural networks through software testing. Soft computing techniques such as GA are very much suitable for test size and coverage problems. Efforts are taken to develop the finest potential solutions for the automation in test suits and test sequence generation. Problems like test sequence, test data generation in white box testing and functional testing uses GA. In the current era of software development, test automation has a significant function in testing the software in its entirety. Test automation comes with its own challenges which include reusable scripts generation, recompiling the test scripts with modifications for different runs and rapid test development with least amount of development time and effort. Traditional methods such as randomized approaches, goal aligned techniques involve human intervention, development effort, cost. Limited Resources, missing the critical requirements and generation of redundant test cases are the prominent constraints in test case generation. To overcome the mentioned challenges test case generation methods needs enhanced algorithms.

II. RELATED WORK
Test case generation techniques are classified into specification based which uses specification documents to derive the test cases, sketch based commonly work with diagrams such as UML, source code based where in test cases are derived using source code applicable to white box testing [3]. Study suggests test case generation to be a complex problem where in various strategies were proposed for the same. The algorithms GA, GA-NN and MA algorithms were applied in [4] which applies Machine Learning techniques to test generation process. Sketch based test case generation in combination with uml diagrams and state transition diagrams were proposed by [5]. [6][7][8] Test case in combination with soft computing techniques such as Genetic Algorithm, Particle Swarm Optimization, Artificial Bee Colony derived a suitable results. The proposed approach considers test case generation process as a combinatorial optimization and the best feasible solution is in a set of discrete range. Combinatorial solutions were present in the literature, with different approaches: single objective optimization, multiple objective optimization. Test Case is a set of various combinations of input values which run on a scenario to produce the result and later decided accordingly. Hence the Test Case problem is (T, U, M, F), T is set of test instances can be considered as a test set, U is determinate set of solutions from the suite, given an instance x and a feasible solution y m is a measure on y.
Combinatorial approaches are a vital group of precise distinct enhancing approaches. These methods use successive analysis and exclusion of substitutes by Mikhalevich, Shor [9][10][11], the scheme is produces a group of additional schemes. For instance, dynamic encoding approaches, branch bound procedures can be defined in its framework. f is a fitness function usually considered to be a goal function which can be either min or max. Test case generation as a Single objective optimization [12] aims at achieving maximum fitness value such that the test suit derived will have the high probability of generating good code coverage. Consider T be the group of test cases {T 1 , T 2 , T 3 …Tn} and t is targets, the optimization problem is to generate a test suit to generate maximum fitness value.
Test case as a Multi objective optimization [13] here aims to maximize the fitness value considering along with code coverage the other parameters such as dependencies and all coverage criteria.
Max F i (X) = C 1 (T 1 , X), C 2 (T 2 , X), Cn(Tn, X), where C 1 , C 2 , and Cn are the various coverage criteria. Differential evolutionary techniques [14] uses multidimensional real valued function, generates new population based on existing solutions with a simple formulation.

III. METHODOLOGY
The current work proposes a System Graph, (graph representing the web application with annotations of number of link dependencies and number of data dependencies) for a program under test where the data and link dependencies of the web page are captured. As the test cases deals with the combination of input values the data dependencies [15] are considered to play a vital role. Further the link dependencies on the web page are prioritized. In combinatorial tests, Pairwise (T-way testing) contains on choosing a subsection of test cases that covers completely potential sets of arrangements, decreasing the amount of entire test cases to be executed, and growing the test efficiency by distributing its range, though using a minor pairs of test cases. This is mainly operative once the amount of variables surges, dropping extremely the amount of test cases. Pairwise testing [16,17] which is proven to reduce the test cases considerable is been the next in the process so as to reduce the combination of input value pairs in the test suit generated. The proposed GA algorithm as specified in Fig. 1 generates and optimizes test cases specific to functional testing, is one of the randomized search procedures that have been established in a determination to emulate the method of regular selection and usual genetics. GA is proven to create first-class elucidations to optimization complications.
System Graph can be constructed either with the source code analysis [18,19] and the web page itself. Since the data flows from one node to other node the resultant test cases after the Pairwise testing are considered, the same set of test cases are represented using a Graph for the logical representation of gene in our encoding phase of GA. Once the Graph generates the required paths as in Fig. 2, deliberated as test cases and genes are constructed.

A. Parameters Considered
Fitness value: Fitness is defined to give a value to each candidate solution which is considered to play a vital role in the search space. Fitness value [20] guides the whole Genetic Algorithm or a PSO algorithm in order to select the correct fit of individuals.
Code Coverage: Code Coverage [21] is considered to be a measure which of the test suite and the source code of the system covered with respect to this test suite.
Branch Coverage: Brach Coverage [22] is one more important method which ensures that the path/paths selected covers at least one branch, the branches true/false executed. 145 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 4, 2021 Dependency Coverage: Data dependencies and link dependencies [23] drive the extreme amount of test cases where page transits throughout the web application.

IV. STRUCTURE OF GENETIC ALGORITHM
Genetic Algorithm runs with an initial population of genes which are test cases. Fitness function is computed over the population to select set of chromosomes which will participate in the next generation population. Cross over and mutation operators are applied over the selected population to generate diversified range of population.GA stops once the population is either converged or for a specified number of iterations.

A. Genetic Algorithm
GA is a parameter coding technique which usually works on population of solutions and deterministic transitions. Considering the test case generation with respect to multi objective optimization PARETO [24] solutions and multicriteria decision-aid technique is applied to select the finest solution. PROMETHEE technique [25] of decision is applied such that ranking amongst the individuals. Positive ranking is given as in Eq. 1, which expresses to what extent each alternative outranks all the others.

B. Genetic Algorithm Encoding
Each chromosome is encoded as a combination of pages and the data flow for each element to other element in a web page. We use a graph data structure to indicate the paths and web pages. The data flow from one element to other likely one page to other page is created. P1→P2→P3….Pn. From the Graph below sample genes considered:

C. Fitness Function and Selection Mechanism
Tournament based selection [26,27] is preferred over the roulette wheel selection as to lessen the risk of missing test cases. The primary fitness value is derived based on the valuation standard code coverage. If selected set of test cases covers the maximum code coverage are assigned to be highly probable. Secondary fitness value is dependent on the number of data dependencies and link dependencies of the given nodes. Individual gene with fitness f will succeed in the tournament of t individuals picked from the test suite with whole population given as in Eq. 2.
where P(F) constitutes the probability. S denotes the genes having lower fitness score. The anticipated tournament succeed from a tournament size s is specified as in Eq. 3.
A test case is given a higher fitness value depending on the below functions. The mutation process [28] is to maximize the chance of complete search space in the algorithm, a predefined mutation probability [29][30] is calculated for each chromosome, and score is arbitrarily engendered to relate the mutation probability to resolve for the mutation process. Sample of the test cases after the crossover operation and mutation operation.
From the above generated test cases: TC21: P4→P6→P9→P6→P10 (TC1 and TC4) Acceptance: As the mutation and crossover involve certain level of uncertainty, the off springs may or may not be superior to parent chromosomes. Hence fitness needs to be calculated for acceptance.
Stop criteria: for a specified number of maximum generations the GA is executed, based on the fitness and code coverage the GA is stopped. )/( + )) Eq. (4) Ci is the code coverage of the test suit, nd, nl,tnd,tnl as stated in the fitness and selection mechanism. for each gene{gene i } If (fitness_value is in the range) Select the gene {gene i } based on Tournament based selection Apply crossover operation to generate the new genes Apply mutation operation to change the gene Add the above population to the current_ population End

Experiment 1:
Triangle classification problem where in the input is considered for three sides of a triangle and the output details the type of a triangle. SideA, SideB, SideC for the first generation was chosen randomly as specified in Table I, these values were further selected to be part of parent chromosomes and underwent GA operations using the fitness function and pairwise testing described in the above algorithm. Pairwise testing values were obtained using online Pairwise online tool. Code coverage from the second generation was noted and specified in Table II. NUnit coverage tool is used to record the code coverage of the test suit. The tables provide the data obtained as a result of our methodology in Fig. 1.  The sample values after processing and normalized values achieved the below result as shown in Fig. 4.  The source code for a simple web application was considered for experimental evaluation and random test cases were generated. This was a Web based application as represented in Fig. 2, the automated test cases were captured using selenium IDE. Selenium IDE is basically a record and playback tool, the test cases generated by Selenium IDE are saved and deployed as JUnit, NUnit test cases. The sample test cases were run through NUnit code coverage [31,32] which achieved the following result over the main modules like performing an insertion and deletion of the records of customers.
Proposed GA Algorithm was then executed on the same set considering few sample test cases from the above document, which achieved the following result. At each iteration the fitness value is generated using the fitness function and the genes are allotted the ranking as per selection criteria discussed previously. The set of genes which are valid and invalid is checked manually which can be automated further. Hence the genes undergo a preprocessing phase for the mentioned.
Considering the above mentioned geneA, geneB, geneC, geneD, Sample of genes generated by GA algorithm for three of the generations are as mentioned below.
Test cases derived for a sample of three generations Generation 0: Random population considered from the Fig. 3 are: The above values after preprocessing where in repeated genes and invalid genes were processed and further reduced. Validity and invalidity of the genes were verified based on the data associated with the genes, for instance the path from P1→P2 is valid based on data which were minimized using pairwise testing.  Fig. 6 and Fig. 7 depicts the graph Tests vs. coverage, the very first initialization of the test suit is chosen randomly specified with values in the horizontal axis, where the line coverage and branch coverage are proportional, the intermediate tests didn't achieve the coverage but stabilized in due evolution with Genetic Algorithm.

VI. FUTURE SCOPE
The future work of the proposed work is to evaluate with large scale web applications and console programs. Though the current approach proposes an automated solution, the pairwise integration, validation for each run with respect to the test cases is done manually. The work can be extended with a complete automated integrated solution for generating test cases for the entire process.

VII. CONCLUSION
This paper proposes an automated solution for Test case generation problem by means of Integrated Pairwise Genetic algorithm. A set of optimized test cases after Pairwise testing are considered as initial population for the GA. Considerably less genes were initiated which leads gradually to huge amount of test suites. Code coverage was measured at every generation and based on fitness values the parent genes were selected and then were involved in the generation process. When the evaluation metric code coverage is compared with random generation of test cases and GA, the results show that GA has considerably increased the fitness value and code coverage. Further our work requires and automated integrated solution for the whole process.