Evaluating Dependency based Package-level Metrics for Multi-objective Maintenance Tasks

Role of packages in organization and maintenance of software systems has acquired vital importance in recent research of software quality. With an advancement in modularization approaches of object oriented software, packages are widely considered as re-usable and maintainable entities of objectoriented software architectures, specially to avoid complicated dependencies and insure software design of well identified services. In this context, recently research study of H. Abdeen on automatic optimization of package dependencies provide composite frame of metrics for package quality and overall source code modularization. There is an opportunity to conduct comprehensive empirical analysis over proposed metrics for assessing their usefulness and application for fault-prediction, design flaw detection, identification of source code anomalies and architectural erosion. In this paper, we examine impact of these dependency optimization based metrics in wide spectrum of software quality for single package and entire software modularization. Our experimental work is conducted over open source software systems through statistical methodology based on cross validation fault-prediction and correlation. We conclude with empirical evidence that dependency based package modularization metrics provide more accurate view for predicting fault-prone packages and improvement of overall software structure. Thus, application of these metrics can help the developers and software practitioners to insure proactive management of the source code dependencies and avoid design flaws during software development. Keywords—Software quality; package-level metrics; software modularization; fault-prediction


I. Introduction
Software engineering is aimed at developing mechanism and tools that automates the manual operation.For the assessment of software quality, functional stability and maintainability of its design are prime objectives.Recently, packages have acquired core interest for proper organization of source code entities due to growing complexity of classes in object oriented (OO) source code paradigms.A package is relatively easier to re-use, re-factor and test, eventually reducing maintenance cost [1], [2].There have been increasing efforts to analyze packages and their architecture in object oriented systems to determine quality attributes of object oriented source code [3], [4].Conventionally, software evolution process has been subject to structural and architectural changes in the source code, targeting suitable and organized placement of classes in particular.However, such re-factoring practices can cause drift and deterioration in modularization quality of software [5].Consequently, to insure flexible software modularization, optimization of package structure and their connectivity can be a vital maintenance task.In practice, if quality of package dependencies is evaluated quantitatively, then modifying its structural components to avoid potential flaws becomes easier task.
Although, there have been attempts to improve the modularization of software through heuristic search methods using decomposition techniques and deterministic procedures.These are frequently based on clustering approaches which do not address the source code design issues at precise level of granularity like, classes or packages [6], [7].Furthermore, the existing approaches of software modularization for changing the structure of package entail costly maintenance overhead, complicating its understandability and comprehension [8], [9].Abdeen et al. recently proposed a package level metrics suite which support the modularization of source code architecture using existing package structure, thus without affecting prevalent software design adversely [10].
Despite different existing efforts of proposing metrics to characterize the packages in object oriented systems, there is a need to evaluate capability of these metrics to measure intended quality attributes of software [11], [12].In this paper, we examine the usefulness of package quality metrics proposed by Abdeen et al. in wide spectrum of software quality, i.e., fault-proneness, vulnerability detection, coding standard violation.We also determine the correlation of modularization metrics presented by Abdeen et al. with already studied modularity metrics of different application domains.In this context, First, we develop prediction model for fault-proneness of packages with logistic regression in comparison to traditional Martin package level metrics suite and linear correlation models with post-release faults (reported in standard bug repositories), vulnerabilities and coding standards are examined.Secondly, we form correlation model among different modularization metrics to understand and evaluate the their relationships and its impact over entire modularization design.Experimental results on open source software systems show that: 1) Package Quality metrics by Abdeen et al. are better candidates of faultproneness when used in combination with Martin's metric suite in inter as well as intra releases of software systems; 2) Most of Package Quality metrics have shown reasonable association with actual post-release faults, vulnerabilities and coding standards violations detected through open source tools like FindBugs and PMD; 3) Package level Modularization metrics by Abdeen et al. can be used to evaluate the external strength of overall structural design from the perspective of cohesion, coupling and cyclic/acyclic dependencies.It asserts that improper handling of dependencies at package level can further deteriorate source code causing operational complexity and difficult re-usability.
The rest of the paper is organized as follows.Section 2 explains the motivation of research study in the context of Software Quality.Section 3 describes investigated metrics with their formal definitions.Section 4 provides illustrative example for comprehension of metrics computation.Section 5 presents detailed empirical study with stated research objectives, methodology and obtained results using graphical and tabular representation.Section 6 illustrates most significant related work in the literature of package level fault-prediction.Implication of research study is discussed in Section 7. Threats to validity are explained in Section 8 followed by Conclusion as Section 9.

II. Motivation
Over the years, a variety of quality models (QMs) has been proposed objectively to support the software development, through description, assessment and prediction of software quality [13].These models evaluate quality of software systems using defined metrics.Examples of metrics-based models are Maintainability Index(MI) that determines quantitative value of maintainability [14], Modularization Quality (MQ) that evaluates cluster based cohesion or software architecture [15].However, there is rare study over the package based design of source code for maintenance objectives.In this section, we briefly present the motivational context of our research.To achieve this, we discuss domains of these QMs relevant to our paper.

A. Static Analysis based Quality Assessment
Static analysis is carried out to analyze the source code using mainly using open source tools to discover security vulnerabilities.These vulnerabilities can cause critical system malfunction are economically harmful as well.Clearly, techniques that can reduce occurrence of bugs would be beneficial.To achieve this goal, static analysis tools have been designed to report early warnings and design anomalies.Although, their effectiveness is realized some settings, however, usefulness of warnings generated by them is still unclear.Recently, two static analysis tools FindBugs and PMD have been empirically reported to have less false positives [16] which are briefly introduced below: 1) FindBugs: FindBugs, developed by Hovemeyer et al., is a tool that analyzes the java byte-code against various families of warnings characterizing common bugs in many system [17].The main warnings provided by Findbugs are: null pointer de-reference, method not checking for null argument, close() invoked value that is always null.It actually checks the correctness, bad practice, malicious code vulnerability and performance, etc.
2) PMD: PMD is static analysis tool that finds defects, deadcode, duplicate code, sub-optimal code and overcomplicated expression, was first developed by Copeland et al. [18].PMD operates over Java source code unlike FindBugs which analyzes byte-code.PMD statically warns many patterns, such as, jumbled incremented, return finally block, class cast exception, etc.

B. Prediction based Quality Assessment
These models are usually based on source code metrics or defection detection to estimate number of systems faults, failures chances and maintenance effort.Mostly, software fault proneness prediction is taken as good example of these models.Other examples include Software Reliability growth (SRGM) and modeling of processes associated with software failures [19].

III. Description of Studied Metrics
As a matter of best programming practices, software code should adhere to basic principle of high cohesion and low coupling.However, package optimization process should facilitate any structural change within current design of modularization, otherwise, subsequent decision of modification shall make software system vulnerability prone.Metrics proposed by Abdeen et al. provide an approach for automatic optimization of software modularization by minimizing the cyclic connections (direct cyclic-connectivity) among the packages.There are mainly two suites of proposed package measure, i.e., 1) for quality of single package; 2) evaluation of modularization quality on based cyclic/acyclic dependencies which are described in Tables I and II.• CouplingQ(p): Measures ratio of package providers and clients to all dependencies among and within a package.
• CylicDQ(p) : Measures ratio of class cyclic dependencies within the package to all dependencies of package.
• CylicCQ(p) : Measures ratio package cyclic connections among the packages to all dependencies of package.• CCQ(M): measures common closure of modularization for the classes that change together among the packages.
• CRQ(M): measures common reuse of modularization for package that are are reused together.
• ADQ(M): measures the extent of modularization to which cyclic dependencies between the classes are minimized.
• ACQ(M): measures the extent of modularization to which cyclic connections among the package are minimized .
Table I summarizes formal definitions of above mentioned metrics.All theses metrics are based on relationships established among the software entities through direct/indirect and cyclic/acyclic dependencies established through classes and packages.Abdeen et al. has described utility of these metrics in addressing following optimization challenges in packages.
1) Inter-connections among the classes of large application create complex design and also increase interpackage connectivity.2) Inadequate distribution of classes may result in highly complex afferent coupling on particular package, hence, violating the basic design rules of package, i.e, domain, size and coding practice.3) Minimization of package dependencies may also degrade other packages.
Table II describes modularization metrics proposed by Abdeen et al.These modularization metrics were specifically formulated to address the prevalent modularization limitations which mainly focused on changing the structural shape of software from the scratch.Where as, metrics presented in Table II have goal of multi-objective optimization for improving existing package structure in accordance with well-known design principles.Thus, these metrics bear unique importance in automatic software re-modularization while respecting original design decisions.

Modularity based on dependency cost[25]
prediction studies [20], [21].Table V presents the definitions of modularization metrics studied by Lee et al. [22].The main objective of their research was to analyze and compare the various modularity metrics that have been studied in different domains.We set this research work as baseline to further investigate the applicability of package based modularization proposed by Abdeen et al.Below we summarize the definitions, notations used in definitions and interpretation of these metrics in particular context of study by Lee et al.. [22].They have conducted an experimental evaluation of these metrics on evolutionary software and reported correlation of different modularity metrics and their sensitivities towards particular modular factors.
• M newm : This metric is well known approach for quantifying modularity of social network represented in graphical structures.Recently, there has been extensive focus on application of this metric into studies pertaining different scientific domains, specially, social network, metabolic network, neural network and the World Wide Web.Computation of metric is based on theoretical heuristic that edges (links between nodes) within a module (community) are greater than expected ones.Further, in the definition, i and j are nodes, A i j represents edges between nodes.m is the number of total edges and k i indicates expected number of edges in node i. δ is a comparator function that it outputs 1 where its two parameters are same, 0 otherwise.g i , parameter of δ, represents the module containing node i.This metric ranges between 1 as best value and 0 as worst value.
• M bunch : This metric is normalized version of clustering factor (MQ) introduced by Mancoridis textitet al.. [15].MQ is the most frequently used method for evaluation of a software modularity.µ i is representation of intra-edges of module i, while i, j denotes inter-edges between modules i and j in total number of modules k.
• M g&g : This metric was formulated to measure the modularity of complex mechanical products.However, their application in software systems can be interesting towards incorporating mechanical engineering principles and software design theories.Basically, metric quantifies modularity of physical entities using difference of inter and intra edge densities.In the definition, M is number of modules and N is number of mechanical components (total software nodes in our context of study).The numerator of the fraction consists of two part, the sum of intra-edge density of modules and the sum of inter-edge density of the modules.Symbols, i.e., n k and m k are the indexes of first node and last node respectively in module k.R i j denotes row and column in dependency between node i and j (software nodes).
• M rcc : This metric has its basic application in measuring the modularity of evolving software systems.
Computing the Relative Clustered Cost of software systems is key idea of this metric.In perspective of software architectures, software systems having no dependencies shall bear the value 0 and 1 in case of all inter-dependent nodes.DependencyCost function returns a weighted dependency between node i and j.N is the number total nodes, n is the size of module, λ is a user defined parameter for the metric.The weight varies along with the dependency types.If a dependency between i and j is an intra-dependency of a single module, the weight is n λ , where n indicates the number of nodes in the module.On the other hand, if it is an inter-dependency between separate modules, the weight becomes N λ to have considerable penalty in terms of poor coupling.

IV. Working Example
Consider an example of four packages connected through different dependencies as shown in Fig. 1.To simplify the working mechanism of Abdeen et al.'s metrics, we illustrate dependencies by package p with other packages in modularized design.
In Fig. 1, package p forms four kinds of dependencies with four other packages; Internal dependencies which is within the classes of package p, external dependencies which connect classes of package with other four packages, cyclic dependencies which goes out of classes of package p and return to same classes of package p, cyclic connections which form cycle between package p and other package in the modularization design.

V. Empirical Study
In this section, we describe the methodology used to analyze the metrics on open source software systems.The analysis procedures for empirical evaluation involve descriptive structural information of subject systems, statistical analysis, linear correlation and logistic regression analysis.An overview of data processing steps are introduced in Fig. 2. The first step is to mine the source code of subject systems from repositories and archives.Second step is to apply three pronged static analysis of source code files through: 1) Understand1 , commercial static analysis tool for re-engineering and maintenance the software that provides metrics information of parsed code; 2) FindBugs2 that identifies high priority warnings in the source code; 3) PMD 3 an extensive cross language static code analyzer that reports high priority coding rule violations.
Step three is mapping of post-release faults from bug repositories, dependency based metrics and high priority source code violation warnings against package entities of corresponding source code file to compose the data-sets.In fourth step, data analysis techniques are used to build prediction models and determine the relationships among the modularization metrics.Finally, fifth step reports performance of models.

A. Research Objectives
In particular, our research is directed to explore following implications of inter and intra-package dependencies as salient objectives.
1) Cyclic dependencies among the packages are antipatterns and may cause design flaws or make packages fault-prone.2) Directed dependencies can provide composite view of package coupling and package cohesion which are yet evolving concepts for package entities of source code.3) Automatic package optimization improve overall modularization quality of software.
Above mentioned challenges require evaluation of metrics through rigorous process of software quality.This can be achieved, if described metrics may succeed to provide complementary view of source code, consequently strengthening structural and functional validity of software design through proactive decision making.Another aspect of this study is to investigate relationship among Abdeen's modularization metrics solution based on package optimization and Baseline modularization approaches described in Table III.Our assessment criterion are formed on the basis of notion that proposed modularization can provide better illustration of Baseline approaches through different dimensions (e.g., Cyclic dependencies minimization, intra-package dependencies maximization), eventually, help developer's decisions to adhere with design principles during software development.

B. Experimental Methodology
In this section, we provide a brief overview of statistical techniques and mechanism of their application in our study.

1) Correlation Analysis:
The correlation analysis aims to determine relationship among variables.The correlation coefficient is measure of linear association between two variables.For this purpose, Spearman's rank correlation is widely performed over nonparametric nature of software metrics.The significance of correlation is tested at different levels of confidence interval 95%, i.e., p − value < 0.05.
2) Multivariate Logistic Regression:: Logistic regression is standard statistical modeling technique in which the dependent variable can take on one of two different values: 0 and 1.A multivariate logistic regression model is based on the following relationship equation: Where, X 1 , X2..., X n are independent variables, i.e., characteristics describing the source code(package level metrics), Pr (Y = 1|X 1 , X 2 , ...X n ) represents the probability that the dependent variable Y = 1, i.e., the extent of package predicted as faulty.
3) Classification: Classification methodology is applied to predict weather a package is faulty or not.Various studies have set confusion matrix as benchmark to evaluate the performance of models and analyze the prediction capability of independent variables.From the confusion matrix, following two popular accuracy measures are computed to conduct the evaluation.All our prediction models output probabilities of fault-proneness of package entities.To classify a package as faulty, varying thresholds on probability are utilized.Thus, different choices of threshold will produce varying rates of false positives/negatives (FP/FN) and true positives/negatives (TP/TN).
• Accuracy (Acc.):Measures the proportion of correct predictions.Accuracy is defined as: Acc = T P+T N T P+T N+FP+FN .• Precision (Pr.):Measure of exactness, defines probabilities of true faulty packages to the number of package predicted as faulty.Precision is defined as Pr = T P T P+FP .• Recall (Rec.):Measure of completeness, defines the probabilities of true faulty packages in comparison to total number of faulty packages.Recall is defined as Rec = T P T P+FN .• F-measure (F1.):Measures harmonic mean of precision and recall of predicted model.F1 = 2 * Pr. * Rec.Pr.+Rec. .We build Logistic Regression (LR) model to predict the fault-proneness of Abdeen's metrics (AbdeenMod) and comparative analysis is carried out against the Martin's package level metrics (RM).Further, statistical association is used to discover the impact of dependencies over design anomalies (null pointer de-references, infinite recursive loops, bad uses of java libraries) and common programming flaws (unused variable, unnecessary object creation).The correlation analysis aims to determine significant relationship of each of metric described in Table I with quality attributes of source code.According to recent survey, logistic regression is most commonly used and productive technique for fault-prediction performance in software engineering [26].

C. Data Sets
Basically, there are two different types of data-sets formed for experimental work for each category of metrics.We tend to follow the research based perspectives of taking into account subjects (software systems ) of varying nature, i.e., with large It can be well inferred that meaningful statistical conclusions can be drawn, as data-sets encompass diverse domains of architectural composition.Post-release fault data of subjects was obtained from public repositories, i.e., Eclipse Bug Data9 and PROMISE 10 .For package modularization metrics, experiment study consists of 23 versions of two different open source software systems.JHotDraw11 : is a Java GUI framework for technical and structured Graphics.Ant12 : is a Java library and command-line tool whose mission is to drive processes described in build files.These data-sets have been already studied in comparative modularity analysis by Lee et al. [22].Therefore, we utilized same data for 4 wellknown clustering based modularity metrics in our study part of modularity analysis.However, all the AbdeenMod metrics (package quality and package modularization) were computed by our own scripts developed through Understand-Perl API 13with utmost reliance and incremental testing (Made available public ally at site 14   highest priority identified through PMD 16 in corresponding software.Both of these tools are extensively used in source code analysis with recognition in software industry and research pertaining to software quality [27].

D. Experimental Results
This section briefly illustrates experimental results and analysis on both categories of metrics.In our empirical evaluation, our focus is to obtain maximum possible statistical significance for the research objectives defined in earlier section.
1) Package Quality Metrics: Fig. 3 depict the box-plot to describe the distribution of four package-quality metrics in each data-set.Box-plot range distribution of CohesionQ and CouplingQ lies within 25 to 75 percentile as shown in Fig. 3.It can be observed that CohesionQ metric has low median (value ≈ 0.15) for three versions of eclipse, however, it is almost in even distribution for all versions of jEdit (value ≈ 0.3).On the contrary, CouplingQ values seem to be distributed with quite high median (value ≈ 0.75) for all the versions of data-sets.It depicts the fact that high coupling and low cohesion trend is found in almost all data-sets of our study, employing potential threat of packages being faulty quite high.Similarly, values of CyclicCQ and CyclicDQ are relatively with very high median (value ≈ 0.95) compared to other two metrics.Whereas, distribution of CylicCQ and CyclicDQ range within 25 to 75 percentile as shown in Fig. 3, showing presence of high inter-package cyclic dependencies and connections in all the data-sets.
Table VIII summarizes detailed information of faultprediction model built on the basis of intra-release test and train data-sets.Results obtained from 10 times 10-fold crossvalidation for two models RM and AbdeenMod+RM using LR are presented in Table VIII.It can be clearly observed that there is substantial accuracy improvement while classifying the AbdeenMod+RM model against RM(Baseline) model in most of cases (indicated with ).From Table VIII, we make following observations.First, AbdeenMod+RM models for data-set with higher number of packages have easily outperform traditional approaches of fault-prediction which is case with three versions of Eclipse (2.0, 2.1, 3.0) and Camel-1.6.Second, if data-set contains small number of packages then prediction may result adversely which is the case with jEdit-4.3 and JDTCore-3.4where accuracy is not improved with AbdeenMod+RM model.Third, AbdeenMod+RM model's highest accuracy improvement was reported as 14.7%, making the application of cyclic dependency metrics quite significant for fault-proneness prediction.F1-score is another dimension of performance measure that takes int account both precision and recall.Mean accuracy of successful models in Table VIII has been seen in range of 0.83 ≈ 0.40 (indicated with ).Despite the accuracy improvement in some of cases, F1-score is still unsatisfactory in some cases, e.g., Lucene-2.4,implying lesser precision of AbdeenMod+RM.Generally, prediction result has been improved by 13.3% as maximum value in Eclipse (Larger data-set) and 14.7% as maximum value in jEdit (Relatively smaller data-set).
Table IX presents summary of fault-prediction model using inter-release frame which is more rigorous approach of developing train and test data-sets.Table IX mainly shows values of accuracy measure for fault-prediction model across releases of Eclipse, POI and jEdit.Clearly, prediction results with AbdeenMod+RM achieved competitive accuracy in two cases (indicated with ) as shown in Table IX and is recorded with 31% of maximum improvement.Table X shows analysis carried out over the linear correlation of each Abdeen's Package Quality metrics with number of post-release faults, coding violation and anomalies warnings of priority-1 using FindBugs and source code vulnerabilities arnings of priority-1 using PMD in corresponding package for each dataset.Significant magnitude of association is indicated with ( * ) evaluated at p − value < 0.05 as thresh-hold.Findings of Table X can lead following inferences: First, Cohesion metrics exhibits significantly negative correlation as observed in most of the cases except Lucene and JDTCore.Second, statistical significance of all the package quality metrics was observed frequent in data-sets with larger number of packages, e.g., (Eclipse, POI, Camel).Third, coupling metric CouplingQ exhibits statistically significant magnitude of correlation with post-release faults in most of the data-sets.Fourth, statistical significant correlation of CyclicDQ(p) and CyclicDQ(p) metrics is not witnessed frequently with the exception of Eclipse-2.0.Another objective was to determine the relationship of all AbdeenMod metrics with design flaws and vulnerabilities detected PMD or FindBugs.Interestingly all the AbdeenMod metrics show the positive or negative association, but, their statistical significance is rarely observed in all cases.However, Cohesion and CouplingQ metrics have managed to develop correlation significance to considerable extent with data-sets, like (Eclipse, POI, jEdit), adding an evidence to soundness of study.On the contrary CyclicCQ and CyclicDQ(p) are not found correlated with with FB and PMD.Such findings lead to implication that design anomalies detected through open source tools like FindBugsand PMD have relatively substantial influence over package quality metrics, however, lack impact in comparison to actual post release faults.
2) Modularization Metrics: In Table XI, degree of modularization correlation between Abdeen's modularization metrics solutions and Baseline modularization metrics is presented.Values inside the visualization table show the magnitude of association in corresponding cell of metric at the significant level denoted as: p − values(0.001,0.01, 0.05, 0.1) ⇔ symbols("***", "**", "*", ".").For further illustration, Fig. 4(a) and (b) show overall correlation among the modularity metrics through visualization table representation for JhotDraw and Ant data-sets respectively.Distribution of each variable is shown on the diagonal with bi-variate scatter plot.As described earlier, impact of cyclic or direct dependencies is relatively considered as an anti-pattern, hence, negative correlation can be expected in our analysis.
There are some unique implications which can be formed from modularity analysis of each data-set.For JHotDraw, CCQ(M) ADQ(M) and CRQ(M) exhibit complementary view of strong association with modularity metrics except M g&g .On the contrary, ACQ(M) is found with less association in    4(a) and (b).More importantly, CCQ(M), CRQ(M) and ADQ(M) depict a strong positive correlation with M bunch which is considered as most efficient modularity measure in software quality evaluation.

VI. Related Work
Some research studies have used package level metrics in evaluation of software quality and bug prediction.D'Ambros proposed coupling based technique for package understandability and their evolution [28].Wilhelm et al. proposed package dependencies management and control using Martin and size metrics.Additionally, Reibing utilized these metrics to build Object-Oriented Design Model (ODEM) for formalization of design metrics [29].It is worth mentioning that Abdeen et al. continued their effort to propose metrics for improving modularization, but, scope of their study was for entire modularization not for a single package [30].However, exploring the relationship between package level metrics and external quality attributes is yet an interesting research subject.Gupta et al. presented empirical evaluation for package coupling as indicator of its understandability and maintenance [31].Elish explored utility of Martin metrics as determinants of package understandability.He determined that almost all the Martin metrics have significant correlation with effort required to understand the package design [32].Zimerman et al. collected fault data and complexity metrics for Eclipse releases 2.0, 2.1 and 3.0 at package level [33].They successfully derived fault-proneness of packages by constructing logistic and linear regression models.
As a matter of observation, size and complexity metrics can not produce enough information on fault-prediction.There are certainly opportunities to explore other structural properties of packages to achieve an improved prediction accuracy.Taking this direction, Elish developed comparative inter and intra-release prediction models using CK, MOOD and Martin metrics suite [34].Another notable study in recent time is by Zhao et al. [21], who has further endorsed the utility of package level metrics.He presented an empirical analysis for package modularization proposed by Sarkar et al. [35] metrics as having significant association with faultproneness.All these efforts form the motivation of our study to further investigate dependency based structural properties of packages.In particular, study extends the evidence for role f package based design in improvement of fault-prediction and effectiveness in describing software quality attributes [36], [37].

VII. Discussion
Abdeen et al. extended their research work on the basis of package entities in proceeding years.Admittedly, each of their research effort was unique in terms of application and theoretical rationale.Similarly, modularization metrics described in Table V were studied for correlation among themselves, which required further exploration too.Therefore, our work differs in the context of application to determine modularity of Abdeen et al. in relation with these metrics.Following features makes our study distinctive and useful for research community.
• Adding an evidence of relationship between code metrics and design rules by static analysis tools.
• Empirical evaluation of package entities as effective de-bugging components.
• Providing the rationale that maintenance is multiobjective phase using techniques of modularity enhancement, fault-proneness prediction and source code design improvement.
Aforementioned aspects our this research study basically set novelty and technical significance for assuring quality software design and functioning.This study can help software quality engineers to prioritize their tasks.

VIII. Threats to Validity
In empirical software engineering research, use of open source and limited software projects often account for external threats to validity.However, software systems used in our study are state-of-the-art in research literature of software quality bearing an abstract representation of software structure.Since, dependency information extracted in our study can be characteristics of any industrial based operational software, the reported results can be helpful in forming a generalized opinion to reasonable extent.Although, use of more industrial based software systems can mitigate the threat of external validity.Another aspect of computation which is likely to create threats to construct and external validity, is use of Find-Bugs and PMD.Unfortunately, the performance of these tools are dependent on particular experimental environmental setup and system configuration.Therefore, false positives in static analysis results can't be ruled out.Similarly, measurement of metrics and their experiment through use of tools for analysis and programming have their specific limitation like, language dynamics and precision of statistical computation.However, results were acquired with maximum possible programming reliance and analysis confidence to avoid any internal threats.

IX. Conclusion
In this paper, we evaluated the impact of dependency based Package-Quality metrics towards software quality assurance.Our study validates the computation of these metrics on opensource software.Inter-release and intra-release of predictive models were constructed to conclude following: 1) Dependency based Package Quality metrics in combination with traditional package level metrics provide complementary view of fault-proneness of packages.It employs proper management and organization of packages by minimizing the dependencies can enhance the quality and ensure functional stability of software; 2) Comprehensive assessment of post-releases of software can be obtained with dependency based Package Quality metrics to avoid future potential bugs; 3) Dependency based Cohesion and Coupling metrics value improvement can help to keep the software systems up to coding standards and design rules.
The results indicate that prediction models developed with AbdeenMod+RM metrics achieved reasonable accuracy against RM metrics and traditional metrics are outperformed in faultprediction modeling.Thus, recommendation of collecting dependency information can be significant towards software maintenance.In addition to obtained improved prediction accuracy compared to traditional model, AbdeenMod metrics show notable correlation with number of faults in the packages.These results also indicate that dependencies of source code at package level should be properly managed and resolved to depict the reliable software design.We also investigated the numerical relationship between Abdeen's modularization metrics and widely used modularization metrics.Our statistical computation through correlation matrix indicates that these metrics can be applied in quality assurance framework of object oriented software systems.This study has made vital contribution and provided an insights into early prediction of faults in packages through empirical evidences and statistical validations.In future, we aim to derive more metrics and explore the their relationships in other software quality attributes.In addition to this, comparative analysis using effort aware fault-prediction models is research domain to be explored in future.

Fig. 1 .
Fig. 1.Example of packages having inter and intra dependencies.

Fig. 3 .
Fig. 3. Box plot representation of metrics for all data-sets.

Fig. 4 .
Fig. 4. Box-plot representation of Cohesion and Coupling metrics in each data-sets.

TABLE I .
Description of Investigated Package Quality Metrics

TABLE II .
Description of Investigated Package Modularization Metrics Table IV which have been already used as Baseline in many package level bug

TABLE III .
Description of Notations used Tables I and II Unique provider dependencies of package p. p Cli.P Unique client dependencies of package p. pCyc.D Unique cyclic dependencies produced through classes of modularization.pCyc.Con Unique cyclic dependencies produced using packages of modularization.pi Ext.Out.D Unique inter-package dependencies going out from package p. pi Out.Con Unique inter-package external connections from package p. c jOut.D Inter-class dependencies going outside the package p. pi Out.Cyc.D out-going cyclic dependencies produced through classes from package p. pi Out.Cyc.Con out-going cyclic dependencies produced through packages from package p.

TABLE IV .
Description of Martin Metric Suite The number of concrete, abstract classes and interfaces in the package.Ca Afferent Coupling: The number of other packages that depend upon classes within a package.Ce Efferent Coupling: The number of other packages that other class in a package depend upon.A Abstractness: The ratio of abstract classes in package to total number classes in a package.I Instability: The ratio of Efferent Coupling to total Coupling, I = Ce (Ce+Ca) .
D Distance: The distance from the main sequence: D = |A + I − 1|.

TABLE V .
Baseline Modularization Metrics Studied in Different Domains [23]rence DefinitionModularity and community structure in network[23]

TABLE VI .
Computation of Abdeen et al.'s Metrics

TABLE VII .
Structural Information of Subject Systems ). Table VII provides a summarized description of the datasets in our experimental study.It mainly represents system name with release version, number of total packages, total Number of faults, number of highest priority design flaws detected through FindBugs 15 and source code violations having

TABLE VIII .
Intra-release Prediction Models: Comparison

TABLE X .
Magnitude of Correlation of Package Quality Metrics in Eclipse

TABLE XI .
Correlation Coefficient of Modularity Metrics