The Hybrid Jaro-Winkler and Manhattan Distance using Dissimilarity Measure for Test Case Prioritization Approach

—Software product line (SPL) is a concept that has revolutionized the software development industry. It refers to a set of related software products that are developed from a common set of core assets but can be customized to meet specific customer requirements. Integrating SPL techniques into test case prioritization (TCP) can greatly enhance its effectiveness. By considering variability across different products within an SPL, it becomes possible to prioritize test cases based on their relevance to specific product configurations. However, the concept itself still has certain issues, such as in finding the highest rate of early failure detection. Various solutions have been proposed to mitigate this problem, among them is to improve the calculation of string distance using hybrid technique to achieve a high degree for similarity. Dissimilarity-based Technique (DBP) is the basis for our ranking method. The objective is to identify further weaknesses in the product lines as well as the differences between the experiment and real-world applications. Our focus is to enhance hybrid techniques that produce the highest rate of early failure detection. In this paper, early fault detection is selected as the performance goal. In order to choose the optimal methods for DBP for TCP, a comparison between several string distance measures was conducted. This study proposed hybrid techniques that combined Jaro-Winkler and Manhattan string distance namely New Enhanced Hybrid Technique 1 (NEHT1), New Enhanced Hybrid Technique 2 (NEHT2) and New Enhanced Hybrid Technique 3 (NEHT3). The case study was generated using the PLEDGE tool based on a Feature Model (FM). Six test cases were used in the experiment. Result shows the effectiveness of the combination where it achieved higher degree of similarity for T1 vs. T4, T2 vs. T3, T2 vs. T6, and T3 vs. T6, as well as perfect degree of similarity for NEHT1 (100.00%). The result proves that the combination of both techniques improve SPL testing effectiveness compared to existing techniques.


INTRODUCTION
Software product line (SPL) is a collection of related software products that share a common set of core assets while also offering variations to address diverse customer needs [1].The characteristics may be constant throughout all SPL-derived products, or they may be varied and present in only some of them [2].Instead of building each product from scratch, SPL approach emphasizes systematic reuse, enabling efficient development and maintenance of multiple products.SPL streamlines development, reduces redundancy, and enhances consistency across products [3].Many industries implement SPL due to its ability to handle different phases of development using the commonality and variability concepts [4].
Software product line testing (SPLT) involves testing the shared components and individual product variants within an SPL [5].It ensures the quality, compatibility, and correctness of both the common core assets and the unique features of each product.This type of testing addresses the challenges posed by varying configurations, shared components, and differing features, while maintaining overall product line quality [6].Similar to testing in non-configurable code, testing in SPL experiences the coincidental correctness phenomena, which makes it more challenging to detect errors in these systems [7].However, the testing of a single software system is a highly difficult and expensive stage of the software development process, according to the author [8].
Test case prioritization (TCP) is the process of ordering test cases based on certain criteria to optimize testing efforts [9].Even though a few trailing test cases are not exercised, these test suites uncover bugs at the earliest possible time [10].It presents a significant difficulty for software testing [11].In the context of SPL, prioritization becomes complex due to the diversity of features and configurations [12].Researchers suggested TCP procedures, where test cases were restructured and carried out in accordance with a given objective, to boost the efficacy and efficiency of testing [13].A hybrid approach that considers both similarity and dissimilarity should be adopted for effective TCP in SPL development.Similarity refers to the degree to which two or more test cases share common characteristics or requirements while dissimilarity refers to the differences between test cases [14].Techniques like Jaro-Winkler distance and Manhattan distance can be used to compare test cases for similarity, dependencies, and impact.Prioritizing test cases ensures that critical defects are identified early and that testing resources are allocated efficiently.SPL has gained prominence in modern software development by allowing the creation of multiple products with shared features and components.TCP plays a vital role in ensuring the quality and reliability of these products.Hybrid string distance, a combination of various string similarity metrics, presents a promising approach to enhancing TCP in www.ijacsa.thesai.orgSPL [15].Hybrid string distance for TCP in SPL faces challenges such as diverse feature sets, low scalability, requires careful consideration of appropriate metrics, and low adaptability to dynamic changes [16].Managing multiple product variants, selecting appropriate metrics, and ensuring the hybrid approach remains effective and adaptable to evolving requirements are essential for successful implementation.To optimize outcomes in TCP using hybrid string distance in SPL, several objectives should be pursued including comprehensive metric selection, feature-driven prioritization, and scalable algorithm design, adaptability to changes, and empirical validation and evaluation.The hybrid approach should consider specific product variant features, create a mechanism for prioritizing test cases based on these features, and ensure scalability without compromising performance.Finally, the approach should be tested on realworld SPLs to demonstrate improvements in TCP accuracy, coverage, and overall software quality.This paper addresses the limitations of current TCP methods that struggle to accurately gauge the semantic similarity between test cases, resulting in less than optimal prioritization outcomes.To tackle this issue, the study poses a research question: "Which new hybrid technique can offer the highest early failure detection rate in TCP?" The research introduces an innovative TCP approach that combines Hybrid Jaro-Winkler and Manhattan distance, integrating a dissimilarity measure.The main contribution lies in enhancing the precision and efficacy of TCP.This technique is used to overcoming the difficulties linked to precisely measuring semantic similarity.This method aims to advance software testing by significantly enhancing prioritization outcomes by providing a more robust and dependable approach for early failure detection in TCP.
The following section outlines the relevant literature.Section III present the proposed approach in detailed, incorporating the experimental settings and a combination of string distance measures whereas Section IV discuss on the results and discussion are presented, leading to the conclusions, in Section V.

II. RELATED WORK
Incorporating TCP techniques like reordering test cases based on fault detection rate can significantly enhance the effectiveness of software testing by enabling early fault detection [17].In the context of TCP for SPL, the term string distance refers to the measurement of the similarity or dissimilarity of various strings that stand in for test cases [8].With this method, test cases are ranked according to how distinctive or diverse they are from one another in terms of the testing functionality, or the areas of the code covered.String distance allows for better resource allocation by identifying redundant or overlapping test cases [18].However, according to Halim et al. [1], neglecting string distance would result in inefficient testing processes and delayed bug fixes.Therefore, incorporating string distance in TCP is essential for efficient software development [19,20].
Similarity-based prioritization (SBP) focuses on identifying test cases that are similar to each other based on certain criteria, such as code coverage or functionality [1,21].The idea behind SBP is that if one test case covers a particular aspect of the software, then similar test cases are likely to cover the same aspect as well [22].This approach is effective in reducing redundancy in testing efforts by selecting a representative subset of test cases [23].However, dissimilarity-based prioritization (DBP) considers the diversity among test cases.DBP aims to select a diverse set of test cases that covers different aspects of the software under test [23,24].By considering dissimilarities between test cases, this approach ensures comprehensive coverage and reduces the risk of missing critical defects [25].
Sulaiman et al. [25] suggested a measurement based on maximal distance of dissimilarity measure for SPL, which assures thorough coverage and lowers the possibility of overlooking important faults.The study is based on the test case generated from a statechart in comparison to current work, which is based on the FM in the context of the SPL domain.By increasing string distance and prioritizing based on similarity, Halim et al. [1] suggested rearranging test cases to increase the rate of problem identification.The work compared various string distance measures and prioritization algorithms in order to determine the best methods for similarity-based on hybridization of Jaro-Winkler and Hamming distance equation.
Fault detection has been improved in existing studies via the use of new and enhanced hybrid techniques for string distance equations.Recent work by Pospisil et al. [26] aimed to enhance adaptive random TCP for model-based test suites using original technique for Jaccard, Manhattan distance and similarity functions.All of the examined systems achieved improved fault detection performance as a result of the proposed improvement.Another study by Kumar et al. [9] employed Item-based Collaborative Filtering (ICF) to prioritize and decrease the number of products before testing.Hamming string distance was used to calculate the degree of similarity between products.Results of the study show that this approach was able to reduce test suite size.Compared to the works by Pospisil et al. [26] and Kumar et al. [9], the current study concentrated more on using a hybrid string distance method to determine the degree of dissimilarity and then locate the distance with the greatest similarity reading.

III. PROPOSED APPROACH
The ranking method we use is based on dissimilarity.Our objective is to find further weaknesses in the product lines being evaluated as well as the point of difference between test case and real world.The study concentrates on the following research question: RQ1: Which new enhanced hybrid technique produces the best early failure detection rate?
We start by outlining the conditions of our experiment before going on to describe the findings.

A. Experimental Settings
The experiment was carried out on Windows 11 with an AMD Ryzen 5 5625U processor running at 2.30 GHz and 8GB of RAM.The authors developed a New Enhanced Hybrid Techniques (NEHT) by improving string distance using three www.ijacsa.thesai.orghybrid techniques to evaluate the comparability of similarity and dissimilarity measures.For the purpose of generating configuration and prioritizing processes, this technique's similarity and dissimilarity measures will be assessed using current Feature Model (FM), Software Product Line Online Tool (SPLOT) and Product Line EDitor and tests GEneration (PLEDGE) tools.In SPL, FM allows for the systematic representation and management of features, their dependencies, and variations across different products [27].SPLOT is a webbased tool that allows users to create incredibly dynamic Ajaxbased setup and reasoning user interfaces [28], while PLEDGE is an open-source tool that selects and prioritizes product configurations, maximizing the feature interactions covered [29].In order to test the SPL, the author selects an FM for machine learning based on the Global Positioning System (GPS) created by Saini et al. 2023 [8] as in Fig. 1.Due to the fact that not all possible feature combinations are viable, feature diagrams are used to limit the variety of a product line.Based on the FM in Fig. 1, the .xmlfiles will be produced using SPLOT.The .xml file will be used to generate the six test cases displayed in Table I after being run using PLEDGE.An ordered list of configurations is often the outcome of a sampling method.

B. Hybrid of String Distance
The purpose of the proposed approach is to find dissimilarity between two test cases.Two strings distances were chosen to develop the proposed approach which is Jaro-Winkler and Manhattan distances.Jaro-Winkler distance is a string distance algorithm that measures the similarity between two strings [30].It has been widely used in various fields, including TCP.Meanwhile, Manhattan distance is a popular metric used in TCP and works by first creating a matrix of all possible pairs of test cases [15].It is used to measure the distance between two points on a grid-like system, where the distance is calculated by adding the absolute differences of those coordinates.In software testing, this metric helps prioritize test cases based on their proximity to each other.
The selection of the Hybrid Jaro-Winkler and Manhattan Distance Using Dissimilarity Measure for TCP Approach is grounded in its distinctive ability to address the challenges prevalent in existing TCP approaches.The hybrid nature of the chosen method combines the strengths of Jaro-Winkler and Manhattan string distance, offering a comprehensive solution for accurately capturing semantic similarity between test cases.The integration of a dissimilarity measure further enriches the approach, enhancing the precision of TCP.The decision to adopt this method is motivated by its potential to significantly improve prioritization results and contribute to more effective early failure detection in TCP.
By improving two string distance techniques, this method will produce a new hybrid technique that is precise in obtaining faster early failure detection rate.Fig. 2 describes the combinations of two string distance to develop the three new enhanced hybrid techniques.New enhanced hybrid technique 1 (NEHT1) modifies existing Jaro equation, and Manhattan equation replaces value of m and t with value of test cases (T1, T2).New enhanced hybrid technique 2 (NEHT2) combines Jaro-Winkler and Manhattan equations, replaces m value with n value and t with value of test cases (T1, T2), adds value of test cases, divides with n value and multiply with 1-dj.New enhanced hybrid technique 3 (NEHT3) combines Jaro and Manhattan equations where the formula replaces value of m with n and t with value of test cases (T1, T2).

IV. RESULT AND DISCUSSION
Table II shows the similarity and dissimilarity percentages between different pairs of test cases (T1, T2, T3, T4, T5, T6), with NEHT1, NEHT2, and NEHT3 representing different methods or conditions.The values range from 0% (complete dissimilarity) to 100% (complete similarity).Since this is an initial result, results use a single FM to represent a dataset.For NEHT1, T1 vs T4, T2 vs T3, T2 vs T6, and T3 vs T6 recorded complete similarity (100.00%),proving the formula is very effective in similarity calculation.Values for NEHT2 and NEHT3 were similar in T1 vs. T4, T2 vs. T3, T2 vs. T6, and T3 vs.T6, which means both proposed techniques provide a consistent way to determine similarity level.The majority of the results show that NEHT1 is effective at determining the degree of similarity.Fig. 3 and Fig. 4 show similarity and dissimilarity rates of fault detection for the proposed methods (NEHT1, NEHT2, NEHT3).The similarity percentages vary for different methods and test cases.Similar variations may be seen in the dissimilarity percentage, which illustrates how the different approaches of assessing differences differ.There is no uniform trend in how the methods rank similarity or dissimilarity across all test cases.Some test cases consistently show high similarity across all methods, while others show varying degrees of dissimilarity.The author claims that this enhancement will increase the SBP technique's effectiveness [1].Sulaiman et al. [25] stated that similarity and dissimilarity strategies were introduced to tackle scalability problem in the current priority technique.This method provides a straightforward, scalable, and efficient method for prioritizing and reducing the number of test cases.TCP for SPL can be significantly improved by leveraging high similarity in calculation of string distance.High similarity values are advantageous in locating similar test cases across various SPL.As a result, fewer testing efforts are duplicated, and the existing test cases can be reused.Furthermore, low dissimilarity values can improve coverage, ensure effective bug correction, and improve the fault localization.One of the main advantages of SPLT is its ability to save time and resources.Testers can concentrate on the common characteristics shared by all products in the software family rather than testing each product individually.This makes it possible to quickly find and fix errors, at the same time shortens the development process and lowers expenses.For DBP, dissimilarity test case has been proven to be one of the techniques that can speed up failure detection process.This research employed six different test cases to be tested using three proposed hybrid techniques based on the combination of Jaro-Winkler and Manhattan string distances for early fault identification rate.The findings indicated that NEHT1 has a higher rate of fault identification compared to the other two proposed techniques.In order to increase the success rate of NEHT1's fault identification, we plan to make improvements to it in the future.In addition, we intend to use a variety of case study types for this research project.The limitations of current test case prioritization methods, particularly their struggles in accurately capturing semantic similarity, render them unsuitable for the challenges at hand.Traditional approaches fall short in providing a comprehensive solution for the nuanced characteristics of test cases.The proposed method is chosen to overcome these limitations by introducing a hybrid technique that specifically addresses the semantic aspects of test cases.This strategic choice is aimed at mitigating the deficiencies of existing methods and advancing the field of TCP towards a more precise and effective paradigm.

TABLE II .
CALCULATION FOR DEGREES OF SIMILARITY AND DISSIMILARITY