Comparison and Analysis of Different Software Cost Estimation Methods

Software cost estimation is the process of predicting the effort required to develop a software system. The basic input for the software cost estimation is coding size and set of cost drivers, the output is Effort in terms of Person-Months (PM’s). Here, the use of support vector regression (SVR) has been proposed for the estimation of software project effort. We have used the COCOMO dataset and our results are compared to Intermediate COCOMO as well as to MOPSO model results for this dataset. It has been observed from the simulation that SVR outperforms other estimating techniques. This paper provides a comparative study on support vector regression (SVR), Intermediate COCOMO and Multiple Objective Particle Swarm Optimization (MOPSO) model for estimation of software project effort.
We have analyzed in terms of accuracy and Error rate. Here, data mining tool Weka is used for simulation


INTRODUCTION
Cost estimation is a process or an approximation of the probable cost of a product, program, or a project, computed on the basis of available information.Accurate cost estimation is very important for every kind of project, if we do not estimate the projects in a proper way; result the cost of the project is very high sometimes it will be reached 150-200% more than the original cost [19].So in that case it is very necessary to estimate the project correctly.The Cost for a project is a function of many parameters.Size is a primary cost factor in most models and can be measured using lines of code (LOC) or thousands of delivered lines of code (KDLOC) or function points.A number of models have been evolved to establish the relation between size and effort for Software Cost Estimation.Data mining software is one of a number of analytical tools for analyzing data.It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified.Data mining help us to classify the past project data and generate the valuable information.Support vector regression (SVR) is a kernel method for regression based on the principle of structural risk minimization [11,3].Kernel methods have outperformed more traditional techniques in a number of problems, including classification and regression [11,3].Here, the use of SVR has been proposed for the estimation of software project cost and also, it has been found that this technique outperforms the other popular cost estimation procedures in terms of accuracy.The rest of the paper is organized as follows: Literature review refers to some existing estimation methods.Then the basic idea for this new approach for estimation has been discussed.Then the simulated experiment has been mention.We discuss the results and give the concluding remarks.

II. LITERATURE REVIEW
Various effort estimation models have been developed over the last four decades.The most commonly used methods for predicting software development efforts are function Point Analysis and Constructive Cost Model (COCOMO) [10].Function point analysis is a method of quantifying the size and complexity of a software system in terms of the functions that the system delivers to the user [4].The function does not depend on the programming languages or tools used to develop a software project [3].COCOMO is developed by the Boehm [2].It is based on linear-least-squares regression.Using line of code (LOC) as the unit of measure for software size itself contains so many problems [7].These methods failed to deal with the implicit non-linearity and interactions between the characteristics of the project and effort [5,11].
In recent years, a number of alternative modelling techniques have been proposed.They include artificial neural networks, analogy-based reasoning, and fuzzy system and ensemble techniques.Ensemble is used to combine the result of individual methods [12,17].In analogy-based cost estimation, similarity measures between a pair of projects play a critical role [16].This type of model calculates distance between the software project being estimated and each of the historical software projects and then retrieves the most similar project for generating an effort estimate [14].Further, Lefley and Shepperd [9] applied genetic programming to improve software cost estimation on public datasets with great success.Later, Vinay kumar et al. [15] used wavelet neural networks for the prediction of software cost estimation.Unfortunately the accuracy of these models is not satisfactory so there is always a scope for more accurate software cost estimation techniques.

III. THE BASIC IDEA
Suppose we are given training dataset{(x 1 , y 1 ), . . .,(x l , y l )}⊂⨯ℝ, where  denotes the space of the input patterns (e.g. = ℝ d ).The goal of regression is to find the function ƒ(x) that best models the training data.In our case, we are interested in building a regression model based on the training www.ijacsa.thesai.orgdata to use it subsequently to predict the total effort in manmonths of future software projects.In linear regression, this is done by finding the line that minimizes the sum of squares error on the training set.

A. Support Vector Regression
In this work we propose to use ɛ-SVR, which defines the ɛ-insensitive loss function.This type of loss function defines a band around the true outputs sometimes referred to as a tube, as shown in Fig. 1.The idea is that errors smaller than a certain threshold ɛ ˃ 0 are ignored.That is, errors inside the band are considered to be zero.On the other hand, errors caused by points outside the band are measured by variables ξ and ξ* as shown in Fig. 1.
In the case of SVR for linear regression, ƒ   x w, +b, with w ∈ , b ∈ ℝ. .,. denotes the dot product.For the case of nonlinear regression, ƒ   x = , w  ) (x +b, where  is some nonlinear function which maps the input space to a higher (maybe infinite) dimensional feature space.In ɛ-SVR, the weight vector w and the threshold b are chosen to optimize the following problem [11]: The constant C˃0 determines the trade-off between the flatness of ƒ and the amount up to which deviations larger than ɛ are tolerated.ξ and ξ* are called slack variables and measure the cost of the errors on the training points.ξ measures deviations exceeding the target value by more than ɛ and ξ* measures deviations which are more than ɛ below the target value, as shown in Fig. 1.
The idea of SVR is to minimize an objective function which considers both the norm of the weight vector w and the losses measured by the slack variables (see Eq. ( 1)).The minimization of the norm of w is one of the ways to ensure the flatness of ƒ [11].
The SVR algorithm involves the use of Lagrangian multipliers, which rely solely on dot products of (x).This can be accomplished via kernel functions, defined as K (x i , x j ) =  (x i ), (x j )  .Thus, the method avoids computing the transformation (x) explicitly.The details of the solution can be found in [11].

IV. EXPERIMENTS
The regression methods considered in this paper were compared using the well-known COCOMO software project dataset, reproduced in Table I .This dataset consists of two independent variables-Size and EAF (Effort Adjustment Factor) and one dependent variable-Effort.Size is in KLOC (thousands of lines of codes) and effort is given in manmonths [1].In this work we are interested in estimating the effort of future projects, where the effort is given in manmonths.The simulations were carried out using the Weka tool [13].In Weka, SVR is implemented using the Sequential Minimal Optimization (SMO) algorithm [6].The following section describes the experimentation part of work, and in order to conduct the study and to establish the affectivity of the models from COCOMO dataset were used.We calculated an www.ijacsa.thesai.orgIntermediate COCOMO effort by using the following equations: where a and b are the set of values depending on the complexity of software (for organic projects a=3.2, b=1.05, for semi-detached a=3.0, b=1.12 and for embedded a=2.8, b=1.2) and the MOPSO model effort [18]is calculated by using following equations: where a and b are cost parameters and c is bias factor.a=3.96, b=1.12 and c=5.42.The performance measures considered in our work are Mean Absolute Relative Error (MARE) and Prediction (25).The MARE is given by the following equation: Pred ( 25) is defined as the percentage of predictions falling within 25% of the actual known value, Pred (25).fi is the Estimated and yi is the Actual value respectively, n is the number of data points.
We have carried out simulations considering estimating the SVR effort using both independent variables (Size and EAF).The results of our simulations are shown in Table II.Figure 2 shows the graph of measured effort versus estimated effort of Intermediate COCOMO, MOPSO and SVR model.
From the figure 2, one can notice that the SVR estimated efforts are very close to the measured effort.

V. RESULTS AND DISCUSSIONS
The results are tabulated in Table III  The following figure 3 shows the performance measures of Intermediate COCOMO, MOPSO and SVR model.

VI. CONCLUDING REMARKS
This paper provides the use of Support Vector Regression for estimation of software project effort.We have carried out simulations using the COCOMO dataset.We have used weka tools for simulations because it consist of different-different machine learning algorithms that can be help us to classify the data easily.The results were compared to both Intermediate COCOMO and MOPSO models.The accuracy of the model is measured in terms of its error rate.It is observed from the results that SVR gives better results.On testing the performance of the model in terms of the MARE and Prediction the results were found to be useful.The future work is the need to investigate some more data mining algorithms that can be help to improve the process of software cost estimation and easy to use.

TABLE II .
ESTIMATED EFFORTS OF DIFFERENT TYPES OF MODELS . It was observed that the SVR gives better results in comparison with Intermediate COCOMO and MOPSO model.The MARE and Prediction accuracy is good.These results suggest that using data mining and machine learning techniques into existing software cost estimation techniques can effectively improve the accuracy of models.

TABLE III :
PERFORMANCE AND COMPARISONS