Estimating the Number of Test Workers Necessary for a Software Testing Process Using Artificial Neural Networks

—On time and within budget software project development represents a challenge for software project managers. Software management activities include but are not limited to: estimation of project cost, development of schedules and budgets, meeting user requirements and complying with standards. Recruiting development team members is a sophisticated problem for a software project manager. Since the utmost cost in software development effort is manpower, software project effort and is associated cost estimation models are used in estimating the effort required to complete a project. This effort estimate can then be converted into dollars based on the proper labor rates. An initial development team needs to be selected not only at the beginning of the project but also during the development process. It is important to allocate the necessary team to a project and efficiently distribute their effort during the development life cycle. In this paper, we provide our initial idea of developing a prediction model for defining the estimated required number of test workers of a software project during the software testing process. The developed models utilize the test instance and the number of observed faults as input to the proposed models. Artificial Neural Networks (ANNs) successfully build the dynamic relationships between the inputs and output and produce and accurate predication estimates.


I. INTRODUCTION
"Software is a place where dreams are planted and nightmares harvested . . .a world of were-wolves and silver bullets."This quote from Brad Cox [1] defined the challenges for software project managers in the past as well as today.The software project manager needs to have the skills, techniques and monitoring and control tools to meet the goal of a software development project.The goal is to complete software development within the agreed upon cost, schedule and user expectations.The measure of meeting this goal includes: meeting a schedule and a cost through improving budget distribution, managing human resources and adapting to environment changes.Intelligent project management requires many talents and skills.In 1987, the IEEE standards provide the following definition of software project management: Software project management is the process of planning, organizing, staffing, monitoring, controlling, and leading a software project.Software development has long been perceived as a risky business [2], [3].A project manager can always try to predict the required resources and plan a schedule for a deliverable, but there is no guarantee that this is what will happen unless a careful monitoring and control plan is maintained.His/her ability to identify risks in advance could help planning for additional time to recover and reduce the consequence losses.According to Dr. Patricia Sanders, Director of Test Systems Engineering and Evaluation at OUSD, in her 1998 Software Technology Conference keynote address, 40% of the DoDs software development costs are spent on reworking the software, which in the year 2000 is equal to an actual loss of $18 billion.Furthermore, Sanders stated that only 16% of software development would finish on time and on budget.It was also stated in [4] that: Given that software-intensive projects are among the most expensive and risky undertakings of the 21st century, the investment in weapons from fiscal years 2003 through 2009 will exceed $1 trillion.Furthermore, many of the DoD's most important technology projects will continue to deliver less than promised unless changes are made.Improving how we acquire software-intensive systems is both long overdue and an imperative.
In fact, the software development process is all about people, methodologies and tools.This can be seen from the software development process shown in Figure 1.People have to understand the project requirements, develop project plan and make a design, deployment of the project, test and validate the business requirements and finally fix bugs if any.
Software life cycle includes testing of the software system.The testing process requires significant effort and could cost over 50% of the project effort.This process requires a significant effort.It is defined as the process of executing a program with the intent of finding software bugs, errors or any defects [5]- [7].It is also the process of validating and verifying that the developed software program will work and satisfies the needs of stakeholders.Software testing to be implemented needs a team of qualified personal.The team www.ijacsa.thesai.orgIn this paper, we provide a non-parametric Artificial Neural Network (ANN) model for predicting the number of test workers required during the software testing process.The number of required test workers will depend upon the count of faults (defects) observed at certain test instances.The model should be capable of accurately defining the required team size for testing and also help project managers distribute the effort of his team on various tasks required for the project.In Section II, we present a definition to the staff management problem.Statistical Regression Analysis is presented in Section III.An overview of soft computing techniques and specifically Artificial Neural Networks is presented in Section IV.The evaluation criterion for measuring the goodness of the developed models are presented in Section VI.The two case studies considered in this article are presented in Sections VII and VIII.Finally, we present the conclusion and future work.II.STAFF MANAGEMENT Time, cost, and number of staff estimations are essential duties for project managers in all business enterprises and especially for software projects.The manager needs to calculate an estimate for these main attributes in the early development process.This is not always an easy task for project managers.The role of a project manager is to manage, analyze and make decisions at all development phases according to accessible resources.Estimating time, cost, and staff helps sustain the monitoring and controlling of project activities and, in the end, produce quality.The field of software effort/cost estimation is concerned with providing an estimate of the expected cost, schedule, and manpower required to produce a software system.In fact there are common problems which could occur whenever we build a software system.The source of these problems could be one of the following: • Insufficient requirements for the project  • The adequate numbers of staff needed for each project phase.
• The contribution as a function of time staff member.
• The source of staff such as staff hiring, part timer hiring or consulting.
• The schedule for joining and leaving the project.
Staff management includes project cost.The manager needs to gather adequate information such that the estimated project cost can be computed.Many software effort/cost estimation models where proposed to help in providing a high quality estimate to assist a project manager in considering the best decisions for a project [8], [9].Many software cost estimation models were reported in the literature [10]- [13].These models were used to help project managers to estimate effort, time and cost.
Staff scarcities is considered as sources for either inefficient use of resources or delay in delivering the project.Computing staffing members needed for a project depends on correct predictions of the project demand and expected date of the product to be in the market.Any delay might cause business loss or damage to firm reputation.Numerous methods were used to compute the estimate and predict staffing needs, based on the firms past experience, project types and sales and manufacture statistics [14]- [17].

III. STATISTICAL REGRESSION ANALYSIS
Statistical regression analysis associates relationships among a set of independent variables and one or more www.ijacsa.thesai.orgdependent variables.The independent variables could be historical measurements about certain events in the past while we want to estimate or predict an independent variable at this instant of time or even in the future.Many techniques for carrying out regression analysis were evolved in the past.Linear regression and ordinary least squares regression are parametric methods that use Least Square Estimation (LSE) to estimate mathematical model parameters.COCOMO uses such regression methods.

A. Single Linear Regression
Regression analysis measures the degree of influence of the independent variables on a dependent variable.In the case of simple bivariate regression where there is a single independent variable, the dependent variable could be predicted from the independent variable by the simple equation: a is constant and b is the slope.This model is linear in the parameters a i .y is called the independent variable and x i , i = 1, . . ., n are called the independent variables.The goal is to find the relationship between the dependent and independent variables.To compute the regression coefficient for the single independent variable given in Equation 1, we use the formula: Where x is the mean (average) of the x values and ŷ is the mean of the y values.The parameter a is computed by the formula: Equation 2 can be expanded to be:

B. Multiple Linear Regression
Equation 1 can be expanded to a multivariate concept as follows: Where x ij is the i th observation on the j th independent variable.To show how the parameter estimation process works, we assume we have a system with four input variables x 1 , x 2 , x 3 , x 4 and single output y.Thus, the model mathematical equation can be represented as: To find the values of the model parameters a's we need to build what is called the regression matrix ϕ.This matrix is developed based on the experiment collected measurements.
Thus, ϕ can be presented as follows given there is a set of measurements m: The parameter vector θ and the output vector y can be presented as follows: The least squares solution of yields the normal equation: which has a solution: But since, the regression matrix ϕ is not a symmetric matrix, we have to reformulate the equation such that the solution for the parameter vector θ is as follows: IV. SOFT-COMPUTING TECHNIQUES Soft Computing techniques were explored to build efficient effort estimation models structures [18], [19].In the past, authors in [20] explored the use of Neural Networks (ANNss), Genetic Algorithms (GAs) and Genetic Programming (GP) to provide a methodology for software cost estimation.ANN were used for software engineering project management in [21].Authors in [22], provided a detailed study on using Genetic Programming (GP), Neural Network (ANNs) and Linear Regression (LR) in solving the software project estimation.Many data sets provided in [23], [24] were explored with promising results.A fuzzy COCOMO model was developed in [18].
Recently, In [12], author provided a pioneering set of models modified from the famous COCOMO model with interesting results.Later on, many authors explored the same idea with some modification [25]- [28] and provided a comparison to the work presented in [12].Exploration of the advantages of the Takagi-Sugeno (TS) technique on building a set of linear models over the domain of possible software Kilo Line Of Code (KLOC) were investigated in [29].Authors in [30] presented an extended work on the use of Soft Computing Techniques to build a suitable model structure to utilize improved estimations of software effort for NASA software projects.On doing this, Particle Swarm Optimization (PSO) was used to tune the parameters of the COCOMO model.The performance of the developed model was evaluated using NASA software projects data set.A comparison between COCOMO-PSO, Artificial Neural Networks (ANNs), Halstead, Walston-Felix, Bailey-Basili ... a neural network is a system composed of many simple processing elements operating in parallel whose function is determined by network structure, connection strengths, and the processing performed at computing elements or nodes.
According to Nigrin (1993), p. 11 Nigrin1993, ANN was defined as: A neural network is a circuit composed of a very large number of simple processing elements that are neurally based.Each element operates only on local information.Furthermore each element operates asynchronously; thus there is no overall system clock.
ANN can exhibit many brain-like behaviors such as learning, association, generalization, feature extraction, optimization and noise immunity.The basic simple unit of any ANN is the perceptron which is presented in Figure 3.
Artificial neural networks (ANN) have been proposed in many articles as a tool which was successfully able to develop software cost estimates.In [32], author provided a novel artificial neural network (ANN) prediction model which incorporates COCOMO and ANN-COCOMO II, to provide more accurate software estimates at the early phase of software development.ANN was employed to regulate the software features considering historical project data.In [33], authors provided a survey on the cost estimation models using artificial neural networks.ANN has many advantages they include: • A neural network can perform tasks that a linear program cannot.
• When an element of the neural network fails, it can continue without any problem by their parallel nature.
• A neural network learns and does not need to be reprogrammed.
• It can be implemented in any application.
The learning process in ANN is the algorithm which is used to adjust the weights of the network in order to minimize the difference between the actual and predicted values by the network.Usually, the weights of the network are initialized randomly.There are four basic types of learning rule: Error Correlation Learning (ECL), Boltzmann Fig. 3.The simple building block of ANN learning (BL), Hebbian Learning (HL), and Competitive Learning (CL).The detailed descriptions of these learning rules are referred to the work of [34].Among all the training algorithms, Back-Propagation (BP) which follows ECL rule is the most popular choice.

VI. EVALUATION CRITERIA
The performance of the developed two models; the Auto-Regression and the Artificial Neural Networks models will be evaluated using a number of evaluation criteria.They are: • The Variance-Accounted-For (VAF) criteria was adopted by [35]: • The Mean square error (MSE): • The Euclidian distance (ED): • The Manhattan distance (MD): • In [36], the authors provided an empirical study for data modeling in software engineering application and used radial basis function (RBF) to develop effort estimation model.They considered the mean magnitude of relative error (MMRE) as the main performance measure.We will evaluate the (MMRE) over the training and testing data as described in [36].The mean magnitude of relative error (MMRE), defined as: Where y and ŷ are the observed and predicted number of test workers the neural network model and n is the number of measurements used in the experiments, respectively.

VII. TEST/DEBUG DATA 1
Field report data was developed to measure system faults during testing in a real-time application [37].The software system consists of 200 modules with each having one kilo line of code of FORTRAN.A Test/Debug dataset of 111 measurements is given in Table I.To develop a ANN test work estimate model, we used the data set to train the ANN.The observed and predicted number of workers was calculated based on the test instances and the real detected faults and shown in Figure 4.The convergence of the neural networks is shown in Figure 5 over 3000 epochs.The observed and predicted number of workers calculated based the test instances and the real detected faults is shown in Figure 5.The convergence of the neural networks is shown in Figure 6.III.The data set was presented in [37].The number of measurements collected during the testing process is small.This represents a difficulty for traditional parameter estimation techniques.It is sometimes difficult to correctly estimate model parameters using a small number of measurements.To build a test work estimate model, we used the data set to build both the MR and ANN models.The observed and predicted number of workers calculated is based on the test instances and the real detected faults are shown in Figure 8.The convergence of the neural networks is shown in Figure 9 over 3000 epochs.

IX. CONCLUSIONS AND FUTURE WORK
Estimating the number of test workers during the software testing process became a challenge problem.Numerous methods were used to estimate and predict staffing needs, based on the firms past experience, project types and sales and manufacture statistics.Thus, tools and methods are required to fill the gap in this major area of software project life cycle development.In this paper, we propose our initial idea of developing predictive models for defining the estimated number of test workers of a software project during the software testing process using ANN.The developed models utilize the test instance and the number of observed faults as input to the proposed models.Two cases studies were presented and many evaluation criterions were used to validate the developed model performance.Artificial Neural Networks (ANNs) successfully build the dynamic relationships between the inputs and output and produce and accurate predication estimates.We plan to explore other soft computing techniques to handle this problem such as fuzzy logic to develop a mathematical relationship which can be easily explained in this case.

(
IJACSA) International Journal of Advanced Computer Science and Applications, www.ijacsa.thesai.organd Doty models were provided with excellent modeling results.In [31], a research work describes the Estimation of Projects in Contexts of Uncertainty (EPCU) model.The model is an estimation process based on fuzzy logic which has the objective of solving the project estimation problem taking the benefits of the Expert Judgment in a formal way, without using quantitative historic data.V. WHAT IS ANN?According to the Defense Advanced Research Projects Agency (DARPA) Neural Network Study (1988, AFCEA International Press, p. 60):

Fig. 7 .
Fig. 7. Observed and predicted number of test workers using Multiple Regression Model: Test/Debug Data 2

TABLE I .
TEST/DEBUG DATA 1 x 1 : TEST INSTANCES, x 2 : REAL DETECTED FAULTS, y k : NO.OF TEST WORKERS

TABLE III .
TEST/DEBUG DATA 2 x 1 : TEST INSTANCES, x 2 : REAL DETECTED FAULTS, y k : NO.OF TEST WORKERS

TABLE IV .
EVALUATION CRITERIA OF THE ANN MODELS