Detection of Reliable Software Using Sprt

— In Classical Hypothesis testing volumes of data is to be collected and then the conclusions are drawn which may take more time. But, Sequential Analysis of statistical science could be adopted in order to decide upon the reliable / unreliable of the developed software very quickly. The procedure adopted for this is, Sequential Probability Ratio Test (SPRT). In the present paper, we have proposed the performance of SPRT on Time domain data using exponential imperfect debugging model and analyzed the results by applying on 5 data sets. The parameters are estimated by using Maximum Likelihood Estimation.


INTRODUCTION
Wald's procedure is particularly relevant if the data is collected sequentially.Sequential Analysis is different from that of Classical Hypothesis Testing where the number of cases tested or collected, is fixed at the beginning of the experiment.In Classical Hypothesis Testing, the data collection is executed without analysis and consideration of the data.After all the data is collected the analysis is done, conclusions are drawn.However, in Sequential Analysis every case is analyzed directly after being collected, the data collected up to that moment is then compared with certain threshold values, incorporating the new information obtained from the freshly collected case.This approach allows one to draw conclusions during the data collection, and a final conclusion can possibly be reached at a much earlier stage as is the case in Classical Hypothesis Testing.The advantages of Sequential Analysis is easily seen.As data collection can be terminated after fewer cases and decisions taken earlier, the savings in terms of human life and misery, and financial savings, might be considerable.
In the analysis of software failure data, we often deal with either Time Between Failures or failure count in a given time interval.If it is further assumed that the average number of recorded failures in a given time interval is directly proportional to the length of the interval and the random number of failure occurrences in the interval is explained by a Poisson process, then we know that the probability equation of the stochastic process representing the failure occurrences is given by a homogeneous poisson process with the expression Stieber [5] observes that if classical testing strategies are used, the application of software reliability growth models may be difficult and reliability predictions can be misleading.However, he observes that statistical methods can be successfully applied to the failure data.He demonstrated his observation by applying the well-known sequential probability ratio test of Wald [4] for a software failure data to detect unreliable software components and compare the reliability of different software versions.In this paper we consider popular SRGM Exponential imperfect debugging model and adopt the principle of Stieber in detecting unreliable software components in order to accept or reject the developed software.The theory proposed by Stieber is presented in Section 2 for a ready reference.The extension of this theory to the SRGM -Exponential imperfect debugging is presented in Section 3. The Maximum Likelihood parameter estimation method is presented in Section 4. and Application of the decision rule to detect unreliable software components with respect to the proposed SRGM is given in Section 5.

II. WALD'S SEQUENTIAL TEST FOR A POISSON PROCESS
The sequential probability ratio test was developed by A.Wald at Columbia University in 1943.Due to its usefulness in development work on military and naval equipment it was classified as "Restricted" by the Espionage Act (Wald, 1947).A big advantage of sequential tests is that they require fewer observations (time) on the average than fixed sample size tests.SPRTs are widely used for statistical quality control in manufacturing processes.An SPRT for homogeneous Poisson processes is described below.
Let {N(t),t  0} be a homogeneous Poisson process with rate "".In our case, N(t) = number of failures up to time " t" and "" is the failure rate.Suppose if we put a system on test (for example a software system, where testing is done according to a usage profile and no faults are corrected) and that we want to estimate its failure rate "".We cannot expect to estimate "" precisely.But if we want to reject the system with a high probability, our data suggest that the failure rate is larger than  1 and accept it with a high probability, if it"s smaller than  0 .As always with statistical tests, there is some risk to get the wrong answers.So we have to specify two (small) numbers "α" and "β", where "α" is the probability of falsely rejecting the system.That is rejecting the system even if λ ≤ 0 .This is the "producer"s" risk.β is the probability of falsely accepting the system .That is accepting the system even www.ijacsa.thesai.orgif λ ≥  1 .This is the "consumer"s" risk.With specified choices of  0 and  1 such that 0 <  0 <  1 , the probability of finding N(t) failures in the time span (0,t ) with  1 ,  0 as the failure rates are respectively given by The ratio The decision rule of SPRT is to decide in favor of 1  , in favor of 0  or to continue by observing the number of failures at a later time than 't' according as 1 0 p p is greater than or equal to a constant say A, less than or equal to a constant say B or in between the constants A and B. That is, we decide the given software product as unreliable, reliable or continue [3] the test process with one more observation in failure data, according as 3) The approximate values of the constants A and B are taken Where " " and "  " are the risk probabilities as defined earlier.A simplified version of the above decision processes is to reject the system as unreliable if N(t) falls for the first time above the line To accept the system to be reliable if N(t) falls for the first time below the line To continue the test with one more observation on (t, N(t)) as the random graph of [t, N(t)] is between the two linear boundaries given by equations (2.6) and (2.7) where (2.8) The parameters ,  , 0  and 1  can be chosen in several ways.One way suggested by Stieber is where q    If λ 0 and λ 1 are chosen in this way, the slope of N U (t) and N L (t) equals λ.The other two ways of choosing λ 0 and λ 1 are from past projects (for a comparison of the projects) and from part of the data to compare the reliability of different functional areas (components).

III. SEQUENTIAL TEST FOR SOFTWARE RELIABILITY
GROWTH MODELS In Section 2, for the Poisson process we know that the expected value of N(t) = λt called the average number of failures experienced in time 't' .This is also called the mean value function of the Poisson process.On the other hand if we consider a Poisson process with a general function (not necessarily linear) m(t) as its mean value function the probability equation of a such a process is Where, 1 () mt, 0 () mt are values of the mean value function at specified sets of its parameters indicating reliable software and unreliable software respectively.Let 0 Decide the system to be unreliable and reject if 1 0 Continue the test procedure as long as Substituting the appropriate expressions of the respective mean value functionm(t) of the considered model we get the respective decision rules and are given in followings lines Acceptance region: Rejection region: Continuation region: It may be noted that in the above model the decision rules are exclusively based on the strength of the sequential procedure (, ) and the values of the respective mean value functions namely, 0 () mt , 1 () mt .If the mean value function is linear in "t" passing through origin, that is, m(t) = λt the decision rules become decision lines as described by Stieber (1997).In that case equations (3.1), (3.2) , (3.3) can be regarded as generalizations to the decision procedure of Stieber (1997).The applications of these results for live software failure data are presented with analysis in Section 5.

IV. ML (MAXIMUM LIKELIHOOD) PARAMETER ESTIMATION
The idea behind maximum likelihood parameter estimation is to determine the parameters that maximize the probability (likelihood) of the sample data.The method of maximum likelihood is considered to be more robust (with some exceptions) and yields estimators with good statistical properties.In other words, MLE methods are versatile in their approach and can be applied to many models and also to different types of data.Although the methodology for maximum likelihood estimation is simple, the implementation is mathematically complex.Using today's computer power, however, mathematical complexity is not a big obstacle.If we conduct an experiment and obtain N independent observations, . Then the likelihood function is given by [9] the following product: Likely hood function by using λ(t) is: L = 1 () The logarithmic likelihood function is given by: ); which can be written as The maximum likelihood estimators (MLE) of are obtained by maximizing L or  , where  is ln L .By maximizing  , which is much easier to work with than L, the maximum likelihood estimators (MLE) of


are the simultaneous solutions of k equations such that: The parameters "a" and "b" are estimated using iterative Newton Raphson Method, which is given as 1 () '( ) For the present model of Exponential imperfect debugging at c=0.05, the parameters are estimated from [10].

V. SPRT ANALYSIS OF DATA SETS
We see that the developed SPRT methodology is for a software failure data which is of the form [t, N(t)] where N(t) is the failure number of software system or its sub system in "t" units of time.In this section, we evaluate the decision rules based on the considered mean value function for Five different data sets of the above form, borrowed from [2]  DS 1 [7] 31.7381360.003253 0.000753 0.005753 DS 2 [2] 24.182003 0.003091 0.000591 0.005591 DS 3 [2] 22.286839 0.003627 0.001127 0.006127 DS 4 [2] 32.293828 0.006095 0.003595 0.008595 DS 5 [8] 30.276648 0.020823 0.018323 0.023323 Using the selected 0 b , 1 b and subsequently the 01 ( ), ( ) m t m t for the model, we calculated the decision rules given by Equations 3.1, 3.2, sequentially at each "t" of the data sets taking the strength ( α, β ) as (0.05, 0.2).These are presented for the model in Table II.From the above table we see that a decision either to accept or reject the system is reached much in advance of the last time instant of the data(the testing time).The following consolidated table reveals the iterations required to come to a decision about the software of each data set.

VI. CONCLUSION
The table II shows that The Exponential imperfect debugging model as exemplified for 5 Data Sets indicate that the model is performing well in arriving at a decision.Out of 5 Data Sets, the procedure applied on the model has given a decision of rejection for 2, acceptance for 2 and continue for 1 at various time instant of the data as follows.DS1, DS4 are accepted at 1 st and 3 rd instant of time respectively, DS2, DS3 are rejected at 2 nd and 6 th instant of time respectively.DS5 is continuing.Therefore, we may conclude that, applying SPRT on data sets we can come to an early conclusion of reliable / unreliable of software.
Depending on the forms of m(t) we get various Poisson processes called NHPP.For our model the mean value function is given as   [7][8] with the assumption of c=0.05.Based on the estimates of the parameter "b" in each mean value function, we have chosen the www.ijacsa.thesai.orgspecifications of 0 bb  , 1 bb   equidistant on either side of estimate of b obtained through a data set to apply SPRT such that b 0 < b < b 1 .Assuming the value of 0.0025   , the choices are given in the following table.

TABLE I .
ESTIMATES OF a,b & SPECIFICATIONS OF b0, b1

TABLE II .
SPRT ANALYSIS FOR 5 DATA SETS